Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to mute all errors (mainly due to access rights) coming from process scraper of the hostmetricsreceiver #34981

Merged
merged 1 commit into from
Sep 30, 2024

Conversation

khalillilahk
Copy link
Contributor

Description:
We are currently encountering an issue with the process scraper in the hostmetricsreceiver, primarily due to access rights restrictions for certain processes like system processes for example. This is resulting in a large number of verbose error logs. Most of them are coming from the process.open_file_descriptors metric but we have errors coming from other metrics as well.

In order to solve this issue, we added a flag mute_process_all_errors that mutes errors comming from the process scraper metrics, as these errors are predominantly associated with processes that we should not be monitoring anyways.

Link to tracking Issue: #20435

Testing: Added unit tests

Documentation:

Errors:

  • Permission denied errors:
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:176
2024-09-02T17:24:10.341+0200    error	scraping metrics        {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied;

  • File not found errors:
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:176
2024-09-02T17:25:38.688+0200    error   scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"}

Config:

receiver
  hostmetrics/process:
    collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s
    scrapers:
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        resource_attributes:
          # disable non_used default attributes
          process.command:
            enabled: false
          process.command_line:
            enabled: false
          process.executable.path:
            enabled: false
          process.owner:
            enabled: false
          process.parent_pid:
            enabled: false
        metrics:
          # disable non-used default metrics
          process.cpu.time:
            enabled: false
          process.memory.virtual:
            enabled: false
          # enable used optional metrics
          process.cpu.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.threads:
            enabled: true

@khalillilahk khalillilahk requested a review from a team September 3, 2024 14:29
Copy link

linux-foundation-easycla bot commented Sep 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@khalillilahk khalillilahk changed the title Add ability to mute errors (mainly due to access rights) coming from process scraper of the hostmetricsreceiver Add ability to mute all errors (mainly due to access rights) coming from process scraper of the hostmetricsreceiver Sep 4, 2024
@khalillilahk
Copy link
Contributor Author

@crobert-1 will this PR be merged into the upcoming release?

@crobert-1
Copy link
Member

@crobert-1 will this PR be merged into the upcoming release?

I don't know for sure. We're waiting on a project maintainer to merge and optionally a component code owner to review.

It does look like there's some ongoing discussion around the path forward with errors from the process scraper (#34988), so it would be good to hear from code owners if this is an acceptable approach.

@braydonk
Copy link
Contributor

braydonk commented Sep 6, 2024

For all intents and purposes #34988 is pretty much sure to happen. We'll still have to iron out details. In the meantime, I am fine with bringing this in, and it can be part of the deprecated batch along with all the other mutes once that work is done.

@crobert-1
Copy link
Member

Thanks for the review and input, @braydonk!

@crobert-1 crobert-1 added the ready to merge Code review completed; ready to merge by maintainers label Sep 6, 2024
@khalillilahk
Copy link
Contributor Author

@dmitryax can this PR be merged please ?

@SamerJ
Copy link
Contributor

SamerJ commented Sep 18, 2024

Hello All,

This PR also applies to our use case.
Today, the error logs we get are too verbose and are expected.

Is there any plan to merge this PR?

Thanks in Advance,

@crobert-1
Copy link
Member

Is there any plan to merge this PR?

I've added the ready to merge label, which lets the project maintainers know that this PR is ready. They will merge at their earliest convenience 👍

@Dainerx
Copy link

Dainerx commented Sep 27, 2024

@dmitryax any updates on this one please?

@dmitryax dmitryax closed this Sep 30, 2024
@dmitryax dmitryax reopened this Sep 30, 2024
@dmitryax dmitryax requested a review from a team as a code owner September 30, 2024 20:36
Copy link

codecov bot commented Sep 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.55%. Comparing base (33362ee) to head (3621e85).
Report is 298 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #34981      +/-   ##
==========================================
+ Coverage   81.27%   81.55%   +0.27%     
==========================================
  Files        2112     2143      +31     
  Lines      165912   176766   +10854     
==========================================
+ Hits       134846   144156    +9310     
- Misses      25940    27305    +1365     
- Partials     5126     5305     +179     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmitryax dmitryax merged commit 1b9c8c8 into open-telemetry:main Sep 30, 2024
314 of 315 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 30, 2024
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
…rom process scraper of the hostmetricsreceiver (open-telemetry#34981)

**Description:** 
We are currently encountering an issue with the `process` scraper in the
`hostmetricsreceiver`, primarily due to access rights restrictions for
certain processes like system processes for example. This is resulting
in a large number of verbose error logs. Most of them are coming from
the `process.open_file_descriptors` metric but we have errors coming
from other metrics as well.

In order to solve this issue, we added a flag `mute_process_all_errors
`that mutes errors comming from the process scraper metrics, as these
errors are predominantly associated with processes that we should not be
monitoring anyways.



**Link to tracking Issue:**
open-telemetry#20435

**Testing:** Added unit tests

**Documentation:** 

**Errors**:

- Permission denied errors:

```
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:176
2024-09-02T17:24:10.341+0200    error	scraping metrics        {"kind": "receiver", "name": "hostmetrics/linux/localhost", "data_type": "metrics", "error": "error reading open file descriptor count for process \"systemd\" (pid 1): open /proc/1/fd: permission denied;

```
- File not found errors:

```
go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:176
2024-09-02T17:25:38.688+0200    error   scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics/process", "data_type": "metrics", "error": "error reading cpu times for process \"java\" (pid 466650): open /proc/466650/stat: no such file or directory; error reading memory info for process \"java\" (pid 466650): open /proc/466650/statm: no such file or directory; error reading thread info for process \"java\" (pid 466650): open /proc/466650/status: no such file or directory; error reading cpu times for process \"java\" (pid 474774): open /proc/474774/stat: no such file or directory; error reading memory info for process \"java\" (pid 474774): open /proc/474774/statm: no such file or directory; error reading thread info for process \"java\" (pid 474774): open /proc/474774/status: no such file or directory; error reading cpu times for process \"java\" (pid 481780): open /proc/481780/stat: no such file or directory; error reading memory info for process \"java\" (pid 481780): open /proc/481780/statm: no such file or directory; error reading thread info for process \"java\" (pid 481780): open /proc/481780/status: no such file or directory", "scraper": "process"}

```



**Config**:

```
receiver
  hostmetrics/process:
    collection_interval: ${PROCESSES_COLLECTION_INTERVAL}s
    scrapers:
      process:
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        resource_attributes:
          # disable non_used default attributes
          process.command:
            enabled: false
          process.command_line:
            enabled: false
          process.executable.path:
            enabled: false
          process.owner:
            enabled: false
          process.parent_pid:
            enabled: false
        metrics:
          # disable non-used default metrics
          process.cpu.time:
            enabled: false
          process.memory.virtual:
            enabled: false
          # enable used optional metrics
          process.cpu.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.threads:
            enabled: true

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge Code review completed; ready to merge by maintainers receiver/hostmetrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants