Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: expose linux softirq counters #2220

Closed
jacobvosmaer opened this issue Nov 25, 2021 · 2 comments · Fixed by #2221
Closed

Feature request: expose linux softirq counters #2220

jacobvosmaer opened this issue Nov 25, 2021 · 2 comments · Fixed by #2221
Labels
enhancement platform/Linux Linux specific issue

Comments

@jacobvosmaer
Copy link
Contributor

jacobvosmaer commented Nov 25, 2021

Linux has counters in /proc that show how often different kinds of softirq handlers run. We would like to see these counters to get a hint of where the kernel is spending time in a recurring incident we see on some of our servers. Node_exporter currently does not expose these counters.

There are two ways we could add the counters.

  • /proc/stat. In prometheus/procfs, we already parse the softirq line of that file, which gives global softirq counts.
  • /proc/softirqs. This file gives counts per CPU.

I don't have a preference whether we expose the global counters or the per-CPU counters. Exposing the global counters is less development work (we already parse /proc/stat) and it saves a syscall. On the other hand, if somewhere down the line someone wants to see the per-cpu counters, it would be better if we spend our time exposing the per-CPU counters from the start.

  1. If I were to submit a PR, should it be with the global counters, or per-CPU?
  2. Does this fit under an existing collector, or should we add a new one?
@jacobvosmaer
Copy link
Contributor Author

@SuperQ I suppose the most natural solution might be to expose the global softirq counters via https://github.com/prometheus/node_exporter/blob/master/collector/stat_linux.go because that collector is for /proc/stat and that is where the global counters live. What do you think?

@SuperQ
Copy link
Member

SuperQ commented Nov 25, 2021

We expose the softirq time per CPU from /proc/stat as part of node_cpu_seconds_total. But that's only the total per CPU, not the individual softirq types. I guess that's what you're looking for.

Looking at the available data, it would add a lot of cardinality to enable the per-softirq + per-CPU combo.

How about this, if you think the global one in /proc/stat is good enough for your use we add it to the stat collector, but disabled by default. With a boolean flag --collector.stat.softirq that defaults to false. This way we can add it with no default user visible change, so if we find that it's not good enough, we can add the /proc/softirqs collector later (also probably off by default due to cardinality).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement platform/Linux Linux specific issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants