Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd collector hangs and opens a lot of dbus connections #1535

Closed
jobec opened this issue Nov 8, 2019 · 9 comments
Closed

systemd collector hangs and opens a lot of dbus connections #1535

jobec opened this issue Nov 8, 2019 · 9 comments

Comments

@jobec
Copy link

jobec commented Nov 8, 2019

Host operating system: output of uname -a

Linux xxxx 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.18.1 (branch: HEAD, revision: 3db7773)
build user: root@b50852a1acba
build date: 20190604-16:41:18
go version: go1.12.5

node_exporter command line flags

--collector.systemd

Are you running node_exporter in Docker?

No

What did you do that produced an error?

Retrieve systemd metrics on a system where the dbus/systemd-logind got stuck for some reason.

What did you expect to see?

In the previous version (0.17.0) the exporter returned the other metrics, even though something was wrong with getting systemd metrics.

What did you see instead?

In version 0.18.1 the behavior changed and the exporter just hangs, never returning anything.

Restarting the 2 services (dbus and systemd-logind) makes it return immediately en spits out this on the console:

INFO[0000] Starting node_exporter (version=0.18.1, branch=HEAD, revision=3db77732e925c08f675d7404a8c46466b2ece83e)  source="node_exporter.go:156"
INFO[0000] Build context (go=go1.12.5, user=root@b50852a1acba, date=20190604-16:41:18)  source="node_exporter.go:157"
INFO[0000] Enabled collectors:                           source="node_exporter.go:97"
INFO[0000]  - arp                                        source="node_exporter.go:104"
INFO[0000]  - bcache                                     source="node_exporter.go:104"
INFO[0000]  - bonding                                    source="node_exporter.go:104"
INFO[0000]  - conntrack                                  source="node_exporter.go:104"
INFO[0000]  - cpu                                        source="node_exporter.go:104"
INFO[0000]  - cpufreq                                    source="node_exporter.go:104"
INFO[0000]  - diskstats                                  source="node_exporter.go:104"
INFO[0000]  - edac                                       source="node_exporter.go:104"
INFO[0000]  - entropy                                    source="node_exporter.go:104"
INFO[0000]  - filefd                                     source="node_exporter.go:104"
INFO[0000]  - filesystem                                 source="node_exporter.go:104"
INFO[0000]  - hwmon                                      source="node_exporter.go:104"
INFO[0000]  - infiniband                                 source="node_exporter.go:104"
INFO[0000]  - ipvs                                       source="node_exporter.go:104"
INFO[0000]  - loadavg                                    source="node_exporter.go:104"
INFO[0000]  - mdadm                                      source="node_exporter.go:104"
INFO[0000]  - meminfo                                    source="node_exporter.go:104"
INFO[0000]  - netclass                                   source="node_exporter.go:104"
INFO[0000]  - netdev                                     source="node_exporter.go:104"
INFO[0000]  - netstat                                    source="node_exporter.go:104"
INFO[0000]  - nfs                                        source="node_exporter.go:104"
INFO[0000]  - nfsd                                       source="node_exporter.go:104"
INFO[0000]  - pressure                                   source="node_exporter.go:104"
INFO[0000]  - sockstat                                   source="node_exporter.go:104"
INFO[0000]  - stat                                       source="node_exporter.go:104"
INFO[0000]  - systemd                                    source="node_exporter.go:104"
INFO[0000]  - textfile                                   source="node_exporter.go:104"
INFO[0000]  - time                                       source="node_exporter.go:104"
INFO[0000]  - timex                                      source="node_exporter.go:104"
INFO[0000]  - uname                                      source="node_exporter.go:104"
INFO[0000]  - vmstat                                     source="node_exporter.go:104"
INFO[0000]  - xfs                                        source="node_exporter.go:104"
INFO[0000]  - zfs                                        source="node_exporter.go:104"
INFO[0000] Listening on :9100                            source="node_exporter.go:170"
ERRO[0511] ERROR: systemd collector failed after 480.956166s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 486.027451s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 486.342074s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 495.666704s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 495.958417s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 501.043409s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 501.346704s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"
ERRO[0511] ERROR: systemd collector failed after 507.093354s: couldn't get units: read unix @->/run/dbus/system_bus_socket: EOF  source="collector.go:132"

During such events where something goes wrong with systemd, there are also a lot of conenctions from node_exporter to the bus:

xxxx@xxxxx# busctl --list --no-pager
NAME                                         PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.0                                           1 systemd         root             :1.0          -                         -          -
:1.1                                        5071 polkitd         polkitd          :1.1          polkit.service            -          -
:1.12429174                                91431 sedispatch      root             :1.12429174   auditd.service            -          -
:1.12429175                                29206 smbd            root             :1.12429175   smb.service               -          -
:1.12748849                               124924 node_exporter   xxxxxxx          :1.12748849   sshd.service              -          -
:1.12748850                               124924 node_exporter   xxxxxxx          :1.12748850   sshd.service              -          -
:1.12748851                               124924 node_exporter   xxxxxxx          :1.12748851   sshd.service              -          -
:1.12748852                               124924 node_exporter   xxxxxxx          :1.12748852   sshd.service              -          -
:1.12748853                               124924 node_exporter   xxxxxxx          :1.12748853   sshd.service              -          -
:1.12748854                               124924 node_exporter   xxxxxxx          :1.12748854   sshd.service              -          -
:1.12748855                               124924 node_exporter   xxxxxxx          :1.12748855   sshd.service              -          -
:1.12748856                               124924 node_exporter   xxxxxxx          :1.12748856   sshd.service              -          -
:1.12748857                               124924 node_exporter   xxxxxxx          :1.12748857   sshd.service              -          -
:1.12748858                               124924 node_exporter   xxxxxxx          :1.12748858   sshd.service              -          -
:1.12748859                               124924 node_exporter   xxxxxxx          :1.12748859   sshd.service              -          -
:1.12748860                               124924 node_exporter   xxxxxxx          :1.12748860   sshd.service              -          -
:1.12748861                               124924 node_exporter   xxxxxxx          :1.12748861   sshd.service              -          -
:1.12748862                               124924 node_exporter   xxxxxxx          :1.12748862   sshd.service              -          -
:1.12748864                               124924 node_exporter   xxxxxxx          :1.12748864   sshd.service              -          -
:1.12748865                               124924 node_exporter   xxxxxxx          :1.12748865   sshd.service              -          -
:1.12748866                               124924 node_exporter   xxxxxxx          :1.12748866   sshd.service              -          -
:1.12748867                               124924 node_exporter   xxxxxxx          :1.12748867   sshd.service              -          -
:1.12748868                               124924 node_exporter   xxxxxxx          :1.12748868   sshd.service              -          -
:1.12748869                               124924 node_exporter   xxxxxxx          :1.12748869   sshd.service              -          -
:1.12748870                               124924 node_exporter   xxxxxxx          :1.12748870   sshd.service              -          -
:1.12748871                               124924 node_exporter   xxxxxxx          :1.12748871   sshd.service              -          -
:1.12748874                               124924 node_exporter   xxxxxxx          :1.12748874   sshd.service              -          -
...
xxxx@xxxxx# busctl --list --no-pager | grep node_exporter | wc -l
80

I'm not sure yet if node_exporter is causing dbus/systemd to misbehave or that it's a symptom of it.
Anyway, I think unresponsive connections are not closed properly before opening a new one. Causing a pile up of stale connections.

@DonatasFe
Copy link

DonatasFe commented May 18, 2020

Have similar issue. Had to disable systemd collector

node_exporter, version 0.18.1 (branch: HEAD, revision: 3db7773)

5.3.11-1.el7.elrepo.x86_64

ls -la /proc/260063/fd | grep socket | wc -l
125

netstat -anp |grep node-exporte | wc -l
85

@discordianfish
Copy link
Member

Well somebody still needs to figure out if this is the node-exporter or dbus acting up.

@jobec
Copy link
Author

jobec commented May 20, 2020

@DonatasFe what was your RHEL version?
Because it looks like we didn't have it anymore since a recent upgrade of RHEL.

@discordianfish
Copy link
Member

Yeah let us know, if it’s solved in more recent version I think it can be closed

@DonatasFe
Copy link

@DonatasFe what was your RHEL version?
Because it looks like we didn't have it anymore since a recent upgrade of RHEL.
it is 5.3.11-1.el7.elrepo.x86_64

@jobec
Copy link
Author

jobec commented May 21, 2020

I actually meant, the version of your redhat install.
I can see it's 7.x from your output, but what's the sub version?

@DonatasFe
Copy link

I actually meant, the version of your redhat install.
I can see it's 7.x from your output, but what's the sub version?

Here it is: CentOS Linux release 7.4.1708 (Core)

@discordianfish
Copy link
Member

Going to assume this is fixed in newer versions.

@graudeejs
Copy link

I just had this issue on CentOS 7.5 with node_exporter 1.1.2. Restarting systemd with systemctl daemon-reexec solved the issue instantly

@prometheus prometheus locked as resolved and limited conversation to collaborators Jul 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants