-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hostmetricsreceiver] Add important per-process counters #12482
Comments
Pinging code owners: @dmitryax |
@dgcom is this something you plan to work on? If so I will assign the issue to you. |
I would love to, but I don't have enough time and skills in Go currently to contribute... |
@TylerHelmuth I can take this one. |
@evan-bradley it's yours. |
@dgcom Just to clarify, are you talking about the Windows concept of a process handle? If so, I do not believe the library that the hostmetricsreceiver uses to gather process data currently supports getting this information. The other metrics can be easily scraped. I will be adding voluntary and involuntary context switch counts and a open file descriptor count. |
For Windows, handles count is "\Process(*)\Handle Count" perfmon counter. # All processes
get-counter "\Process(*)\Handle Count"
# Specific process
get-counter "\Process(explorer)\Handle Count"
# List all available counters for processes
(Get-Counter -ListSet Process).Paths For thread count, it is "\Process(*)\Thread Count" Looking at the library used by the receiver - leoluk/perflib_exporter: perflib-based Prometheus exporter for Windows and low-level Go perflib library - I don't see a reason why it wouldn't be able to retrieve available counters... |
Thank you for the clarification. Most process metrics are generated using data obtained from gopsutil, which is the library I was referring to that doesn't yet support getting a process handle count. It does look like perflib_exporter should be able to retrieve this information. I have limited working knowledge around Windows and do not have a Windows environment readily available to test with, so someone else will have to implement that metric within the hostmetricsreceiver. |
I looked at gopsutil and it does not use performance counters at all, which explains why it only supports cpu, memory and limited number of IO counters. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
What a coincidence - I was actually checking changes in hostmetrics receiver when the bot posted 60 day notice... I see that But process handles seems to be missing... |
@dgcom I wasn't able to add process handles as part of my work, as I don't have a Windows environment to test with. I will leave this issue available for someone else to pick that up. |
@evan-bradley Ok, that's fine, thank you for covering Linux side of things! I'll see if I finally get some time to dig into this myself by the end of the year... Unless someone else will be kind enough to pick this up before that. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I strongly believe that we should keep this open until it is fully resolved. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue is still relevant and should be kept open until it is resolved. |
I would like handle metrics for Windows as well. I have issue #21379 open for this along with PR #22813 that adds support for a Windows exclusive |
I ended up changing the PR to use a WMI query instead and it ended up being the simplest way to do it. The PR is still waiting on a review at this stage. |
This is great news! Hope we'll close this out once PR is merged... |
**Description:** <Describe what has changed.> Adds a new Windows-exclusive metric called process.handles, which represents the handle count of the given process. When enabled, the receiver will make a WMI Query at the beginning of each scrape to update the handle count for all processes on the system. If the metric is enabled on a platform other than Windows, an error will be produced when attempting to refresh handle counts. This matches the rough behaviour of the Linux exclusive `process.open_file_descriptors`. **Link to tracking Issue:** <Issue number if applicable> #21379 #12482 **Testing:** <Describe what testing was performed and which tests were added.> Ran the binary with the following configuration: ``` receivers: hostmetrics: collection_interval: 2s scrapers: cpu: {} disk: {} filesystem: {} load: {} memory: {} network: {} paging: {} process: mute_process_name_error: true metrics: process.handles: enabled: true processes: {} exporters: file: path: x.json service: pipelines: metrics: receivers: [hostmetrics] exporters: [file] ``` The following is an example result of a scrape with this configuration. https://gist.github.com/braydonk/c97996272574319e03111dc79076a1bd
The new |
Great, now need to test it out! |
I tested 0.8.1 and I can see threads and handles counts in Windows - this is great! This issue can be closed now. |
Is your feature request related to a problem? Please describe.
There are several very important per-process metrics which are not yet collected by host metrics receiver, for example:
These can be considered process golden metrics and are needed for most troubleshooting and trend analysis to make sure there are no threads/handles leaks in the process.
Describe the solution you'd like
Collect and emit at least metrics mentioned above.
The ideal solution - collect more per-process metrics (optionally) - include those which are being collected by leading infrastructure monitoring tools on the market.
Describe alternatives you've considered
I have analyzed per-process metrics collected by such competitor tools like New Relic and Data Dog and their infrastructure agents are able to collect these metrics, however I would like to use OTEL collector as unified agent instead.
Additional context
There is a (great) trend to switch to OTEL host metrics receiver for infrastructure monitoring (ex. Signoz, Splunk Observability, New Relic etc.) and if such tools utilize same host metrics receiver, they will all miss very important and useful metrics making troubleshooting and observability much harder.
The text was updated successfully, but these errors were encountered: