[hostmetricsreceiver] Add important per-process counters #12482

dgcom · 2022-07-15T04:03:52Z

Is your feature request related to a problem? Please describe.
There are several very important per-process metrics which are not yet collected by host metrics receiver, for example:

process thread count
process open handles count
process open file descriptor count

These can be considered process golden metrics and are needed for most troubleshooting and trend analysis to make sure there are no threads/handles leaks in the process.

Describe the solution you'd like
Collect and emit at least metrics mentioned above.
The ideal solution - collect more per-process metrics (optionally) - include those which are being collected by leading infrastructure monitoring tools on the market.

Describe alternatives you've considered
I have analyzed per-process metrics collected by such competitor tools like New Relic and Data Dog and their infrastructure agents are able to collect these metrics, however I would like to use OTEL collector as unified agent instead.

Additional context
There is a (great) trend to switch to OTEL host metrics receiver for infrastructure monitoring (ex. Signoz, Splunk Observability, New Relic etc.) and if such tools utilize same host metrics receiver, they will all miss very important and useful metrics making troubleshooting and observability much harder.

github-actions · 2022-07-15T17:12:32Z

Pinging code owners: @dmitryax

TylerHelmuth · 2022-07-15T17:13:34Z

@dgcom is this something you plan to work on? If so I will assign the issue to you.

dgcom · 2022-07-15T17:15:27Z

@dgcom is this something you plan to work on? If so I will assign the issue to you.

I would love to, but I don't have enough time and skills in Go currently to contribute...

evan-bradley · 2022-07-15T17:46:04Z

@TylerHelmuth I can take this one.

TylerHelmuth · 2022-07-15T17:47:29Z

@evan-bradley it's yours.

evan-bradley · 2022-08-03T19:49:39Z

process open handles count

@dgcom Just to clarify, are you talking about the Windows concept of a process handle? If so, I do not believe the library that the hostmetricsreceiver uses to gather process data currently supports getting this information.

The other metrics can be easily scraped. I will be adding voluntary and involuntary context switch counts and a open file descriptor count.

dgcom · 2022-08-03T20:42:36Z

For Windows, handles count is "\Process(*)\Handle Count" perfmon counter.
In PowerShell this is available with this example:

# All processes
get-counter "\Process(*)\Handle Count"
# Specific process
get-counter "\Process(explorer)\Handle Count"
# List all available counters for processes
(Get-Counter -ListSet Process).Paths

For thread count, it is "\Process(*)\Thread Count"
Windows does not have file descriptors counter, so this should be available only for Linux.

Looking at the library used by the receiver - leoluk/perflib_exporter: perflib-based Prometheus exporter for Windows and low-level Go perflib library - I don't see a reason why it wouldn't be able to retrieve available counters...

evan-bradley · 2022-08-04T14:34:29Z

Thank you for the clarification. Most process metrics are generated using data obtained from gopsutil, which is the library I was referring to that doesn't yet support getting a process handle count.

It does look like perflib_exporter should be able to retrieve this information. I have limited working knowledge around Windows and do not have a Windows environment readily available to test with, so someone else will have to implement that metric within the hostmetricsreceiver.

dgcom · 2022-08-04T16:30:31Z

I looked at gopsutil and it does not use performance counters at all, which explains why it only supports cpu, memory and limited number of IO counters.
The best option would be to change process scraper implementation to use perflib_exporter, which provides more per-process data and that data is compatible with many other Windows monitoring implementations.
And I know it is hard to write such low-level cross-platform implementations...

github-actions · 2022-11-10T03:48:21Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dgcom · 2022-11-10T04:04:40Z

What a coincidence - I was actually checking changes in hostmetrics receiver when the bot posted 60 day notice...

I see that process.open_file_descriptors and process.threads are now available:
opentelemetry-collector-contrib/documentation.md at main · open-telemetry/opentelemetry-collector-contrib

But process handles seems to be missing...

evan-bradley · 2022-11-10T14:48:01Z

@dgcom I wasn't able to add process handles as part of my work, as I don't have a Windows environment to test with. I will leave this issue available for someone else to pick that up.

dgcom · 2022-11-10T18:46:30Z

@evan-bradley Ok, that's fine, thank you for covering Linux side of things! I'll see if I finally get some time to dig into this myself by the end of the year... Unless someone else will be kind enough to pick this up before that.

github-actions · 2023-01-10T03:31:09Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dgcom · 2023-01-11T17:50:50Z

This issue has been inactive for 60 days.

I strongly believe that we should keep this open until it is fully resolved.

github-actions · 2023-04-10T03:29:46Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dgcom · 2023-04-10T13:18:20Z

This issue is still relevant and should be kept open until it is resolved.

braydonk · 2023-05-26T19:08:19Z

I would like handle metrics for Windows as well. I have issue #21379 open for this along with PR #22813 that adds support for a Windows exclusive process.handles metric. It doesn't use the performance counter but instead uses NtQuerySystemInformation. This solution does result in only one new syscall per-scrape which is why I chose that, but perhaps the performance counter would be preferred for simplicity.

braydonk · 2023-06-21T14:21:30Z

I ended up changing the PR to use a WMI query instead and it ended up being the simplest way to do it. The PR is still waiting on a review at this stage.

dgcom · 2023-06-21T23:34:13Z

This is great news! Hope we'll close this out once PR is merged...

**Description:** <Describe what has changed.> Adds a new Windows-exclusive metric called process.handles, which represents the handle count of the given process. When enabled, the receiver will make a WMI Query at the beginning of each scrape to update the handle count for all processes on the system. If the metric is enabled on a platform other than Windows, an error will be produced when attempting to refresh handle counts. This matches the rough behaviour of the Linux exclusive `process.open_file_descriptors`. **Link to tracking Issue:** <Issue number if applicable> #21379 #12482 **Testing:** <Describe what testing was performed and which tests were added.> Ran the binary with the following configuration: ``` receivers: hostmetrics: collection_interval: 2s scrapers: cpu: {} disk: {} filesystem: {} load: {} memory: {} network: {} paging: {} process: mute_process_name_error: true metrics: process.handles: enabled: true processes: {} exporters: file: path: x.json service: pipelines: metrics: receivers: [hostmetrics] exporters: [file] ``` The following is an example result of a scrape with this configuration. https://gist.github.com/braydonk/c97996272574319e03111dc79076a1bd

braydonk · 2023-07-05T15:35:27Z

The new process.handles metric is in v0.81!

dgcom · 2023-07-05T17:19:35Z

The new process.handles metric is in v0.81!

Great, now need to test it out!

dgcom · 2023-07-07T05:55:09Z

I tested 0.8.1 and I can see threads and handles counts in Windows - this is great!

This issue can be closed now.

TylerHelmuth added priority:p2 Medium receiver/hostmetrics labels Jul 15, 2022

TylerHelmuth added help wanted Extra attention is needed good first issue Good for newcomers labels Jul 15, 2022

TylerHelmuth assigned evan-bradley Jul 15, 2022

TylerHelmuth removed the help wanted Extra attention is needed label Jul 15, 2022

This was referenced Aug 1, 2022

[receiver/hostmetricsreceiver] Add threads count metric #12802

Merged

Add process.threads open-telemetry/opentelemetry-specification#2705

Merged

evan-bradley mentioned this issue Aug 2, 2022

Add additional process metrics to the metrics semantic conventions open-telemetry/opentelemetry-specification#2706

Merged

evan-bradley mentioned this issue Aug 4, 2022

Add additional process metrics to the metrics semantic conventions open-telemetry/opentelemetry-specification#2708

Closed

evan-bradley mentioned this issue Aug 4, 2022

[receiver/hostmetrics] Add new process metrics #12972

Merged

evan-bradley mentioned this issue Aug 18, 2022

REQUEST: New membership for evan-bradley open-telemetry/community#1136

Closed

6 tasks

github-actions bot added the Stale label Nov 10, 2022

dmitryax removed the Stale label Nov 10, 2022

evan-bradley added the help wanted Extra attention is needed label Nov 10, 2022

evan-bradley removed their assignment Nov 10, 2022

github-actions bot added the Stale label Jan 10, 2023

mx-psi removed the Stale label Feb 7, 2023

github-actions bot added the Stale label Apr 10, 2023

braydonk mentioned this issue May 26, 2023

[receiver/hostmetrics] add process.handles metric #22813

Merged

github-actions bot removed the Stale label May 26, 2023

evan-bradley closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hostmetricsreceiver] Add important per-process counters #12482

[hostmetricsreceiver] Add important per-process counters #12482

dgcom commented Jul 15, 2022

github-actions bot commented Jul 15, 2022

TylerHelmuth commented Jul 15, 2022

dgcom commented Jul 15, 2022

evan-bradley commented Jul 15, 2022

TylerHelmuth commented Jul 15, 2022

evan-bradley commented Aug 3, 2022 •

edited

Loading

dgcom commented Aug 3, 2022

evan-bradley commented Aug 4, 2022

dgcom commented Aug 4, 2022

github-actions bot commented Nov 10, 2022

dgcom commented Nov 10, 2022

evan-bradley commented Nov 10, 2022

dgcom commented Nov 10, 2022

github-actions bot commented Jan 10, 2023

dgcom commented Jan 11, 2023

github-actions bot commented Apr 10, 2023

dgcom commented Apr 10, 2023

braydonk commented May 26, 2023 •

edited

Loading

braydonk commented Jun 21, 2023

dgcom commented Jun 21, 2023

braydonk commented Jul 5, 2023

dgcom commented Jul 5, 2023

dgcom commented Jul 7, 2023

[hostmetricsreceiver] Add important per-process counters #12482

[hostmetricsreceiver] Add important per-process counters #12482

Comments

dgcom commented Jul 15, 2022

github-actions bot commented Jul 15, 2022

TylerHelmuth commented Jul 15, 2022

dgcom commented Jul 15, 2022

evan-bradley commented Jul 15, 2022

TylerHelmuth commented Jul 15, 2022

evan-bradley commented Aug 3, 2022 • edited Loading

dgcom commented Aug 3, 2022

evan-bradley commented Aug 4, 2022

dgcom commented Aug 4, 2022

github-actions bot commented Nov 10, 2022

dgcom commented Nov 10, 2022

evan-bradley commented Nov 10, 2022

dgcom commented Nov 10, 2022

github-actions bot commented Jan 10, 2023

dgcom commented Jan 11, 2023

github-actions bot commented Apr 10, 2023

dgcom commented Apr 10, 2023

braydonk commented May 26, 2023 • edited Loading

braydonk commented Jun 21, 2023

dgcom commented Jun 21, 2023

braydonk commented Jul 5, 2023

dgcom commented Jul 5, 2023

dgcom commented Jul 7, 2023

evan-bradley commented Aug 3, 2022 •

edited

Loading

braydonk commented May 26, 2023 •

edited

Loading