You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux has counters in /proc that show how often different kinds of softirq handlers run. We would like to see these counters to get a hint of where the kernel is spending time in a recurring incident we see on some of our servers. Node_exporter currently does not expose these counters.
There are two ways we could add the counters.
/proc/stat. In prometheus/procfs, we already parse the softirq line of that file, which gives global softirq counts.
/proc/softirqs. This file gives counts per CPU.
I don't have a preference whether we expose the global counters or the per-CPU counters. Exposing the global counters is less development work (we already parse /proc/stat) and it saves a syscall. On the other hand, if somewhere down the line someone wants to see the per-cpu counters, it would be better if we spend our time exposing the per-CPU counters from the start.
If I were to submit a PR, should it be with the global counters, or per-CPU?
Does this fit under an existing collector, or should we add a new one?
The text was updated successfully, but these errors were encountered:
We expose the softirq time per CPU from /proc/stat as part of node_cpu_seconds_total. But that's only the total per CPU, not the individual softirq types. I guess that's what you're looking for.
Looking at the available data, it would add a lot of cardinality to enable the per-softirq + per-CPU combo.
How about this, if you think the global one in /proc/stat is good enough for your use we add it to the stat collector, but disabled by default. With a boolean flag --collector.stat.softirq that defaults to false. This way we can add it with no default user visible change, so if we find that it's not good enough, we can add the /proc/softirqs collector later (also probably off by default due to cardinality).
Linux has counters in /proc that show how often different kinds of softirq handlers run. We would like to see these counters to get a hint of where the kernel is spending time in a recurring incident we see on some of our servers. Node_exporter currently does not expose these counters.
There are two ways we could add the counters.
/proc/stat
. In prometheus/procfs, we already parse thesoftirq
line of that file, which gives global softirq counts./proc/softirqs
. This file gives counts per CPU.I don't have a preference whether we expose the global counters or the per-CPU counters. Exposing the global counters is less development work (we already parse
/proc/stat
) and it saves a syscall. On the other hand, if somewhere down the line someone wants to see the per-cpu counters, it would be better if we spend our time exposing the per-CPU counters from the start.The text was updated successfully, but these errors were encountered: