Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd input not showing in output #497

Closed
trstringer opened this issue Jan 30, 2018 · 29 comments
Closed

systemd input not showing in output #497

trstringer opened this issue Jan 30, 2018 · 29 comments
Assignees

Comments

@trstringer
Copy link

I have the following configuration:

[SERVICE]                              
    Flush 1                            
    Log_Level debug                    

[INPUT]                                
    Name systemd                       
    Tag host.*                         
    Systemd_Filter _SYSTEMD_UNIT=bashdaemon.service                            

[OUTPUT]                               
    Name stdout                        
    Match *   

For some reason, Fluent Bit isn't picking up the journal entries for the bashdaemon.service unit. This screenshot below (direct link) shows this:

  • Top terminal: fluent-bit running the configuration
  • Bottom left: cat'ing out the configuration (just for verification purposes)
  • Bottom right: journalctl output tailing with a filter on the bashdaemon.service unit

Any thoughts on why Fluent Bit may not be ingesting these systemd unit journal entries? Or how to troubleshoot this?

Imgur

Thanks in advance!

@trstringer
Copy link
Author

For what it's worth, this is reproducible without using a custom systemd unit:

$ ./fluent-bit -i systemd -o stdout -p "_SYSTEMD_UNIT=sshd.service"

Empty output when sshd events are entered into journald.

@edsiper
Copy link
Member

edsiper commented Jan 31, 2018

the following command works properly:

$ fluent-bit -i systemd -p "_SYSTEMD_UNIT=sshd.service" -o stdout -f 1

note that -p needs to be right after the plugin that you want to modify the property. Adding -p after -o stdout means "set that property to stdout".

@trstringer
Copy link
Author

@edsiper ah ok. Unfortunately when I use your exact command, I still get nothing. Looks like the systemd input plugin is not working altogether. Debug output is showing me nothing relevant. What's the way to troubleshoot this? I must be missing something.

Or I should just use the rpm that you package up 😄

@edsiper
Copy link
Member

edsiper commented Jan 31, 2018

can you try adding a path to the systemd journal files ?, e.g:

$ fluent-bit -i systemd -p path=/var/log/journal -p "_SYSTEMD_UNIT=sshd.service" -o stdout -f 1

@trstringer
Copy link
Author

Unfortunately that didn't work either (my path was /run/log/journal).

@edsiper
Copy link
Member

edsiper commented Jan 31, 2018

what about if you append -vv at the end of the command line, do you see any extra message ?

@edsiper
Copy link
Member

edsiper commented Feb 1, 2018

@trstringer please give me the details of that Linux distro (specific version) to try to replicate the problem

@edsiper edsiper self-assigned this Feb 1, 2018
@trstringer
Copy link
Author

@edsiper sorry for the delay. Regarding -vv, doesn't seem like anything to go off of:

Fluent-Bit v0.13.0                     
Copyright (C) Treasure Data            

[2018/02/01 21:02:54] [ info] [engine] started                                 
[2018/02/01 21:02:54] [debug] [router] default match rule systemd.0:stdout.0 

Regarding the Linux machine itself:

  • CentOS Linux release 7.4.1708 (Core)
  • Linux fluentbitvm2 3.10.0-693.17.1.el7.x86_64 Fix include paths #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@s-kostyaev
Copy link

s-kostyaev commented Feb 6, 2018

Same here.

~ master ±10 ♦ docker run -v /var/log/journal:/var/log/journal:ro -v /etc/machine-id:/etc/machine-id:ro -it fluent/fluent-bit /fluent-bit/bin/fluent-bit -i systemd -p path=/var/log/journal -p tag='host.*' -o stdout -vv
Fluent-Bit v0.12.12
Copyright (C) Treasure Data

[2018/02/06 03:17:51] [ info] [engine] started
[2018/02/06 03:17:51] [debug] [router] default match rule systemd.0:stdout.0

Host OS ubuntu artful:

~ master ±10 ♦ uname -a
Linux nskdev2 4.13.0-32-generic #35-Ubuntu SMP Thu Jan 25 09:13:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
~ master ±10 ♦ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10
DISTRIB_CODENAME=artful
DISTRIB_DESCRIPTION="Ubuntu 17.10"
~ master ±10 ♦ docker -v
Docker version 18.01.0-ce, build 03596f5
~ master ±10 ♦ systemd --version
systemd 234
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN default-hierarchy=hybrid

In another console:

~ master ±10 ♦ logger "some test message"
~ master ±10 ♦ logger "some test message"

I can see this messages with journalctl:

~ master ±10 ♦ journalctl -f
Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal' can see all messages.
      Pass -q to turn off this notice.
-- Logs begin at Mon 2017-12-18 13:04:29 +07. --
фев 06 10:08:26 nskdev2 feofan[21866]: some test message
фев 06 10:13:30 nskdev2 feofan[523]: some test message
фев 06 10:13:32 nskdev2 feofan[580]: some test message
фев 06 10:13:33 nskdev2 feofan[643]: some test message
фев 06 10:16:23 nskdev2 feofan[7177]: some test message
фев 06 10:16:24 nskdev2 feofan[7229]: some test message
фев 06 10:17:15 nskdev2 feofan[9418]: some test message
фев 06 10:17:26 nskdev2 feofan[9862]: some test message
фев 06 10:17:54 nskdev2 feofan[11087]: some test message
фев 06 10:17:55 nskdev2 feofan[11278]: some test message
фев 06 10:21:05 nskdev2 feofan[18115]: some test message

Also with strace I can see memmap & memunmap of updated journal file.

@s-kostyaev
Copy link

s-kostyaev commented Feb 6, 2018

On this machine:

# uname -a
Linux k8s-dev-node-1 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
# docker -v
Docker version 17.03.1-ce, build c6d412e
# systemd --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

all works fine

@edsiper
Copy link
Member

edsiper commented Feb 6, 2018

@trstringer @s-kostyaev

there is a new fix associated to systemd, would you please rebuild GIT master and test if the problem you describe is fixed ?

ref: #500

@s-kostyaev
Copy link

s-kostyaev commented Feb 7, 2018

@edsiper After rebuilding docker image from fluent-bit git master problem is not solved:

Fluent-Bit v0.13.0
Copyright (C) Treasure Data

[2018/02/07 03:33:35] [ info] [engine] started
[2018/02/07 03:33:35] [debug] [router] default match rule systemd.0:stdout.0

@edsiper
Copy link
Member

edsiper commented Feb 7, 2018

@s-kostyaev did you check that journalctl works inside the docker image ?

@s-kostyaev
Copy link

s-kostyaev commented Feb 7, 2018

there is no journalctl inside container https://github.com/fluent/fluent-bit-docker-image/blob/master/0.12/Dockerfile

FROM gcr.io/google-containers/debian-base-amd64:0.1
MAINTAINER Eduardo Silva <[email protected]>
LABEL Description="Fluent Bit docker image" Vendor="Fluent Organization" Version="1.1"

# Fluent Bit version
ENV FLB_MAJOR 0
ENV FLB_MINOR 12
ENV FLB_PATCH 5
ENV FLB_VERSION master

ENV FLB_TARBALL http://github.com/fluent/fluent-bit/archive/master.zip
# v$FLB_VERSION.zip

RUN mkdir -p /fluent-bit/bin /fluent-bit/etc /fluent-bit/log

RUN apt-get -qq update \
    && apt-get install -y -qq \
       build-essential \
       cmake \
       make \
       wget \
       unzip \
       libsystemd-dev \
    && wget -O "/tmp/fluent-bit-${FLB_VERSION}.zip" ${FLB_TARBALL} \
    && cd /tmp && unzip "fluent-bit-$FLB_VERSION.zip" \
    && cd "fluent-bit-$FLB_VERSION"/build/ \
    && cmake -DFLB_DEBUG=On -DFLB_TRACE=On -DFLB_JEMALLOC=On -DFLB_BUFFERING=On ../ \
    && make \
    && install bin/fluent-bit /fluent-bit/bin/ \
    && apt-get remove --purge --auto-remove -y -qq \
       build-essential \
       cmake \
       make \
       wget \
       unzip \
       libsystemd-dev \
       bzip2 \
       openssl \
       manpages \
    && apt-get install -y -qq --no-install-recommends ca-certificates \
    && rm -rf /tmp/*

# Configuration files
COPY fluent-bit.conf /fluent-bit/etc/
COPY parsers.conf /fluent-bit/etc/
COPY parsers_java.conf /fluent-bit/etc/

# Entry point
CMD ["/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]

@edsiper
Copy link
Member

edsiper commented Feb 7, 2018

@s-kostyaev make sure fluent-bit have access to the journal files (file system), otherwise it will not work.

@trstringer
Copy link
Author

That did work for me! Worth noting, I had to run fluent-bit with root privileges. I'm guessing that has to do with journal access requirements.

@edsiper
Copy link
Member

edsiper commented Feb 7, 2018

@trstringer did you use the latest from GIT master ?

@trstringer
Copy link
Author

@edsiper yes I did. It worked running as root, but without root privs it didn't work. That's expected though right?

@edsiper
Copy link
Member

edsiper commented Feb 7, 2018

well, I think it depends on how each distribution set permissions in Journal files. I am happy to hear the problem has gone :)

@s-kostyaev
Copy link

@edsiper I can see journal files from container:

~/fluent-bit-docker-image/0.12 master ±1 ♦ docker run -v /var/log/journal:/var/log/journal:ro -it 1f33b133c419 ls -l /var/log/journal/de698b9175c74d4496ffd64ae6a13751/
total 147468
-rw-r-----+ 1 root input 16777216 Feb  8 03:19 system.journal
-rw-r-----+ 1 root input 41943040 Dec 27 06:16 [email protected]~
-rw-r-----+ 1 root input 50331648 Jan 26 16:47 system@6a47d9fc34684fa5842068fb3cadc880-0000000000000001-0005614c5757abbc.journal
-rw-r-----+ 1 root input  8388608 Feb  8 03:18 user-1002.journal
-rw-r-----+ 1 root root   8388608 Dec 27 06:16 [email protected]~
-rw-r-----+ 1 root input  8388608 Jan 26 11:10 user-1002@859fb26c868449158c1213a35dff3679-0000000000000719-0005614c57f2d944.journal
-rw-r-----+ 1 root input  8388608 Feb  8 03:18 user-65534.journal
-rw-r-----+ 1 root input  8388608 Jan 26 11:10 user-65534@c150efb39cab4b82bfb7f6f6d267e3e9-000000000000a938-000563ab3b39dfa0.journal

@edsiper
Copy link
Member

edsiper commented Feb 8, 2018

@s-kostyaev since fluent bit is not part of "input or root" groups, it cannot get access to the files.

@s-kostyaev
Copy link

@edsiper fluent-bit inside container is running by root user.

@codepainters
Copy link

I just hit the very same issue. I've done the following experiment (inside container):

strace -o /var/log/s -s 1000 -f -tt /fluent-bit/bin/fluent-bit \
    -i systemd  -p tag='host.*'  -o stdout

Logs show that it tries to watch /var/log:

336   15:29:10.815384 open("/var/log/journal", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 18
336   15:29:10.815468 fstat(18, {st_mode=S_IFDIR|S_ISGID|0755, st_size=4096, ...}) = 0
336   15:29:10.815551 inotify_add_watch(17, "/proc/self/fd/18", IN_MODIFY|IN_ATTRIB|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_ONLYDIR) = 1
336   15:29:10.815659 getdents(18, /* 3 entries */, 32768) = 104
336   15:29:10.815743 getdents(18, /* 0 entries */, 32768) = 0
336   15:29:10.815813 fstatfs(18, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=118741870, f_bfree=113927537, f_bavail=107890029, f_files=30171136, f_ffree=29481037, f_fsid={val=[2733853089, 1238983524]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_RELATIME}) = 0
336   15:29:10.815905 close(18)         = 0

But it never gets any notification (I'm sure there were updates while tracing):

grep read.17 /var/log/s
336   15:29:10.817152 read(17, 0x7fff83e3bf30, 272) = -1 EAGAIN (Resource temporarily unavailable)
336   15:29:11.000962 read(17, 0x7fff83e3bf80, 272) = -1 EAGAIN (Resource temporarily unavailable)
336   15:29:12.000300 read(17, 0x7fff83e3bf80, 272) = -1 EAGAIN (Resource temporarily unavailable)
336   15:29:13.000613 read(17, 0x7fff83e3bf80, 272) = -1 EAGAIN (Resource temporarily unavailable)
336   15:29:14.000912 read(17, 0x7fff83e3bf80, 272) = -1 EAGAIN (Resource temporarily unavailable)

...which make sense, given what inotify manual says:

Inotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional watches must be created.

In fact, it starts to work if I specify -p path=/var/log/journal/8793f43cadac4eb2be55ddc2954fc74f/ on the command line (that's the directory on this particular host where actual journal files are stored, the host runs Ubuntu 18.04). Shouldn't the systemd plugin descend into subdirectories of a given path?

@codepainters
Copy link

I found a little time to dig into the source code, and the issue is obvious now.

The plugin uses systemd API (namely sd_journal_* functions), systemd itself relies on a concept of unique machine ID, stored in /etc/machine-id (see machine-id) - this is exactly the hex number in the journal directory path (I suppose this is to allow for many machines to store logs into a single shared network volume).

And now - /etc/machine-id inside a docker container is either missing or different, hence the API fails to open host journal files. A quick hack (copy /etc/machine-id from host into the container) confirmed this.

I'm not sure what is the best solution here, but mounting the file from host into container seems the best option (e.g. docker ... -v /etc/machine-id:/etc/machine-id:ro ...)

@lucaim
Copy link

lucaim commented Jun 8, 2018

I can confirm that mounting the file /etc/machine-id read-only works. Example of what needs to be added:

      containers:
        volumeMounts:
        - name: etcmachineid
          mountPath: /etc/machine-id
          readOnly: true
      volumes:
      - name: etcmachineid
        hostPath:
          path: /etc/machine-id
          type: File

@briantopping
Copy link

To add notes for others here, I ran into problems here as well. I didn't need to add SYS_ADMIN privs to the container to run it, just mount mount the machine ID file and /var/log/ directory in the fluent bit container.

lzecca78 added a commit to sighupio/fury-kubernetes-logging that referenced this issue May 22, 2020
angelbarrera92 pushed a commit to sighupio/fury-kubernetes-logging that referenced this issue Jun 5, 2020
@lalithvaka
Copy link

I can confirm that mounting the file /etc/machine-id read-only works. Example of what needs to be added:

      containers:
        volumeMounts:
        - name: etcmachineid
          mountPath: /etc/machine-id
          readOnly: true
      volumes:
      - name: etcmachineid
        hostPath:
          path: /etc/machine-id
          type: File

@lucaim , what should be the path in the Config map/ INPUT ?

[INPUT]
        Name              systemd
        Tag               host.*
        Path              /run/log/journal/<etcmachineid> 
        Systemd_Filter    _SYSTEMD_UNIT=docker.service
        Systemd_Filter    _SYSTEMD_UNIT=kubelet.service

@joshgc
Copy link

joshgc commented Sep 9, 2022

Sorry to resurrect a dead thread, but since I found this one, maybe others will too.

I'm still facing the same problem with /etc/machine-id and /var/log/journal mounted in the container. Surprisingly my strace shows that fluent-bit in the container IS getting notified correctly, it mmaps the system.journal then immediately munmaps it and never prints anything out. I ran fluent-bit for a minute and it didn't print anything and journalctl/systemd-cat on the host OS both confirm messages were going by. I've also tried this experiment without the Read_From_Tail. Any advise would be amazing

strace -o straceout -s 1000 -f -tt /fluent-bit/bin/fluent-bit -i systemd -p Read_From_Tail=On -o stdout

Here's some juicy lines from my straceout

$ grep inotify straceout
165   17:02:06.202056 inotify_init1(IN_NONBLOCK|IN_CLOEXEC) = 20
165   17:02:06.202120 inotify_add_watch(20, "/proc/self/fd/21", IN_MODIFY|IN_ATTRIB|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_ONLYDIR) = 1
165   17:02:06.202262 inotify_add_watch(20, "/proc/self/fd/21", IN_MODIFY|IN_ATTRIB|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_ONLYDIR) = 2
165   17:02:06.202342 inotify_add_watch(20, "/proc/self/fd/22", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF|IN_UNMOUNT|IN_ONLYDIR) = 3
165   17:03:09.968490 inotify_rm_watch(20, 1 <unfinished ...>
...

And then an example successful, ie no EAGAIN, read from fd=20

165   17:02:16.664716 read(20, "\3\0\0\0\2\0\0\0\0\0\0\0 \0\0\0user-375336.journal\0\0\0\0\0\0\0\0\0\0\0\0\0", 272) = 48
165   17:02:16.664873 openat(AT_FDCWD, "/var/log/journal/59f82edf0a62455696e0899bc0a4dbf4/user-375336.journal", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 28
165   17:02:16.665199 fcntl(28, F_GETFL) = 0x8800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE)
165   17:02:16.665308 fcntl(28, F_SETFL, O_RDONLY|O_LARGEFILE) = 0
165   17:02:16.665370 fstat(28, {st_mode=S_IFREG|0640, st_size=8388608, ...}) = 0
165   17:02:16.665457 fstat(28, {st_mode=S_IFREG|0640, st_size=8388608, ...}) = 0
165   17:02:16.665521 mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 28, 0) = 0x7fbc8a1ff000
165   17:02:16.665621 munmap(0x7fbc8a1ff000, 8388608) = 0
165   17:02:16.665693 close(28)         = 0
165   17:02:16.665751 read(20, 0x7fbc8bff25c0, 272) = -1 EAGAIN (Resource temporarily unavailable)

I'm using the Dockerfile.x86_64 from the github for FLB_VERSION 1.7.1. My command to launch the container is

sudo podman run -u root \
    --cap-add SYS_PTRACE \
    --interactive --tty \
    --entrypoint /bin/bash \
    -p 24224:24224 \
    -p 2020:2020 \
    --log-driver=none \
    --volume /home/joshgc/fluent-bit-docker-image/conf/fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf \
    --volume /etc/machine-id:/etc/machine-id:ro \
    --volume /run/log/journal:/run/log/journal \
    --volume /var/log:/var/log \
    6989ad3695877bc1416d882789af79c4bc3ded2c4b2061983ed26ec6b885a838

Extra notes:

  1. Running podman without sudo still allows the inotify to receive an event but then it cannot open the actual system.journal file. The presence or absence of -u root did not change any behavior
  2. Running docker instead of podman doesn't change this behavior

@joshgc
Copy link

joshgc commented Sep 13, 2022

Okay I believe I have resolved this as an upstream issue with systemd. It looks like those package maintainers changed the default binary storage format between buster (the fluent-bit container) and bullseye (my host machine). Once I rebuilt Dockerfile.x86_64 with bullseye-slim the problem went away.

For future techno-archeologists, I installed the systemd and tried to used the container's journalctl to debug the issue (which gave more useful prompts about the binary format).

See upstream discussion: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=968055

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants