-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process metrics for Linux #7870
Comments
The official node_exporter can handle process metrics which provides this feature as a one of the collectors. So, we need to provide it as one of the metrics which is implemented in node_exporter_metrics. The configuration for process metrics should be as follows: [INPUT]
Name node_metrics_exporter
collector.process.scrape_interval 60
metrics process
path.procfs /proc/
ne.process_name_regex /fluent-bit/
ne.process_status_regex /R/ |
I already registered a PR for implementing processes metrics which means system level of the statuses of processes and threads on in_node_exporter_metrics here: #7880 |
@cosmo0920 Thanks for your help. I looked into #7880 and confirmed that the system level metrics are captured.
Do you mean process level cpu/memory metrics should be discussed in the other PR? |
Yes. I wanted to discuss this issue and another PR for process level of metrics. |
For the reference, we need to implement process metrics like as: https://github.com/ncabatoff/process-exporter/blob/master/collector/process_collector.go |
@cosmo0920 Thanks.
Yes, I was checking the exactly same code:) Do you think it is possible to implement the feature to scrape top 10 processes at input plugin? or should it be implemented at filter plugin??? I would like to hear your thoughts on it. |
I think that scraping for top 10 processes is highly cost to determine the order with traversing procfs. Like as the above link, we should implement it with traversing all of the metrics of the process which are belonging to each of procfs for the processes. For ordering the top of 10 process of the metrics, these should be handled by monitoring solution side. Another plan is: Perhaps, we need to implement filtering feature for metrics in cmetrics? |
Agreed @cosmo0920 plus the choice of top 10/9/8/100 will be arbitrary so should be left to the user to tune what is required. |
@cosmo0920 @patrick-stephens Thanks.
That makes sense to me. |
@kubotat I sent a PR for covering this issue at: #7943 I have a question for your request. Process' status is rapidly changed as I noticed. So, capturing I mean the Linux process scheduler depends on this parameter for preemption latency: https://elixir.bootlin.com/linux/v6.5.4/source/kernel/sched/fair.c#L72 This could be too small to scrape metrics: This means that 3 digits smaller than scrape interval for collecting metrics. |
@cosmo0920 Thank you so much for your feedback.
Filterting process by statuses is not the mandatory requirement. |
OK. I understand. And yes, they are already implemented in #7943. |
@cosmo0920 Is there any timeline when PR #7943 will be merged into the main branch? |
Not sure but we might able to include this feature in 2.2 development cycle... |
Is your feature request related to a problem? Please describe.
The
in_process
plugin is available today which has capability to check how health a process is. Having process level CPU and Memory metrics metrics in addition to health information is beneficial for the system operation.Describe the solution you'd like
As far as I research, node_exporter does not support process metrics as of today. So I suggest to develop new plugin which captures process level metrics from /proc//stat. Here is the expected configuration for the plugin:
process_name_regex
andprocess_status_regex
options give user great flexibilities to control which process name to be captured and reduce the amount of data by cutting off unnecessary metrics.Describe alternatives you've considered
I considered
in_process
plugin as an alternative. It helps me to check the health status and Memory metrics but it doesn't capture CPU metrics and doesn't work when users don't know the name of process.Additional context
None
The text was updated successfully, but these errors were encountered: