Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/hostmetrics] Process scraper ignores root_path when getting process information #24777

Closed
msoumar-ms opened this issue Aug 2, 2023 · 10 comments · Fixed by #26479
Closed
Labels
bug Something isn't working receiver/hostmetrics

Comments

@msoumar-ms
Copy link

msoumar-ms commented Aug 2, 2023

Component(s)

receiver/hostmetrics

What happened?

I am using the hostmetrics process scraper running inside a Docker container on Linux. My receiver configuration is as follows:

receivers:
  hostmetrics: 
    collection_interval: 60s
    root_path: /hostfs
    scrapers:
      # cpu, disk, etc. omitted here
      process:
        mute_process_exe_error: false # 0.82 only
        mute_process_name_error: false

My docker container is set up in a docker-compose file as follows:

 otel-collector:
    image: ${OTEL-IMAGE-NAME-HERE}
    container_name: otel-collector
    mem_limit: 100m
    user: "0"
    memswap_limit: -1
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./agent/config/otel-collector.yaml:/etc/otel-collector-config.yaml
      - /proc:/hostfs/proc

Previously I was using version 0.71 which did not have the mute_process_exe_error parameter, so I did not have it in the config at the time. When attempting to scrape processes I would receive the following error message for every process running on my machine:

2023-08-02T00:37:21.762Z        error   scraperhelper/scrapercontroller.go:212  Error scraping metrics  {"kind": "receiver", "name": 
"hostmetrics", "data_type": "metrics", "error": "error reading process name for pid 1: readlink /hostfs/proc/1/exe: permission denied; 
error reading process name for pid 2: readlink /hostfs/proc/2/exe: permission denied; error reading process name for pid 3: readlink
 /hostfs/proc/3/exe: permission denied; error reading process name for pid 4: readlink /hostfs/proc/4/exe: permission denied;

These are expected errors that have since been rectified with the mute_process_exe_error parameter. In particular the process names are reading /hostfs/proc/[pid] which is the expected path since that is how I set up the bind mount.

After updating from version 0.71 to 0.82, keeping everything in the otel config and the docker-compose file the same (except adding the mute_process_exe_error which I have set to false while troubleshooting this issue), I see the following error instead:

2023-08-02T00:16:30.691Z        error   scraperhelper/scrapercontroller.go:200  Error scraping metrics  {"kind": "receiver", "name": 
"hostmetrics", "data_type": "metrics", "error": "error reading username for process \"otelcol-contrib\" (pid 1): open /etc/passwd: 
no such file or directory; error reading process executable for pid 2: readlink /proc/2/exe: no such file or directory; error reading 
process name for pid 2: open /proc/2/status: no such file or directory; error reading process executable for pid 3: readlink 
/proc/3/exe: no such file or directory; error reading process name for pid 3: open /proc/3/status: no such file or directory; error 
reading process executable for pid 4: readlink /proc/4/exe: no such file or directory; error reading process name for pid 4: open 
/proc/4/status: no such file or directory; error reading username for process \"otelcol-contrib\" (pid 6): open /etc/passwd: no 
such file or directory;

Of note, the path it is looking at now is /proc/[pid] rather than hostfs/proc/[pid], and the processes it finds are internal container processes (since all of the process names are otelcol-contrib. For example pid 1 on my machine is the default Linux process /sbin/init). My guess is that the scraper is able to read all of the processes on the machine that have been mounted onto the docker container, but to get all of the process information it still goes to /proc (where the internal container processes are) instead of /hostfs/proc. If I mute the exe and name errors, the few metrics I get are for processes that are all labeled otelcol-contrib with low pid's, which also makes me think these are process metrics from within the container itself.

Collector version

v0.71.0, 0.82.0

Environment information

Environment

OS: Ubuntu 18.04

OpenTelemetry Collector configuration

receivers:
  hostmetrics: 
    collection_interval: 60s
    root_path: /hostfs
    scrapers:
      # cpu, disk, etc. omitted here
      process:
        mute_process_exe_error: false # 0.82 only
        mute_process_name_error: false

processors:

exporters:
  logging:
    verbosity: normal
  prometheusremotewrite:
    # 

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: []
      exporters: [logging, prometheusremotewrite]

Log output

No response

Additional context

No response

@msoumar-ms msoumar-ms added bug Something isn't working needs triage New item requiring triage labels Aug 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

This looks like it could potentially be caused by #23861, which changed how the root_path config option is handled. @atoulme might have some more context here.

@atoulme
Copy link
Contributor

atoulme commented Sep 5, 2023

Do you have any environment variables set?

@msoumar-ms
Copy link
Author

If you're referring to the ones on this page: https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/

I do not have any of those set, no.

@atoulme
Copy link
Contributor

atoulme commented Sep 5, 2023

No, I refer to HOST_PROC specifically.

@msoumar-ms
Copy link
Author

No, I don't have that set either.

@atoulme
Copy link
Contributor

atoulme commented Sep 5, 2023

OK. Please try with this env var set to /hostfs/proc, that might offer you a workaround. I'll try to reproduce.

@atoulme
Copy link
Contributor

atoulme commented Sep 5, 2023

I have opened a PR to fix a possible collision, as the awscontainerinsightsreceiver also was setting this env var.

@msoumar-ms
Copy link
Author

Setting the env variable as you instructed solved the problem, thank you!

@crobert-1
Copy link
Member

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Sep 6, 2023
codeboten pushed a commit that referenced this issue Sep 19, 2023
Remove the need to set the environment variable HOST_PROC as part of the
awscontainerinsightsreceiver

#24777
dmitryax pushed a commit that referenced this issue Sep 22, 2023
…g entry (#26479)

**Description:**
A regression introduced in 0.82.0 means that the process scraper doesn't
properly respect the `root_path` configuration key when it comes to
reading process information.

**Link to tracking Issue:**
Fixes #24777 

---------

Co-authored-by: Pablo Baeyens <[email protected]>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
Remove the need to set the environment variable HOST_PROC as part of the
awscontainerinsightsreceiver

open-telemetry#24777
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…g entry (open-telemetry#26479)

**Description:**
A regression introduced in 0.82.0 means that the process scraper doesn't
properly respect the `root_path` configuration key when it comes to
reading process information.

**Link to tracking Issue:**
Fixes open-telemetry#24777 

---------

Co-authored-by: Pablo Baeyens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/hostmetrics
Projects
None yet
3 participants