[receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" #14311

MarioAlexis · 2022-09-19T20:08:17Z

What happened?

Description

Installing latest otel-collector version (0.60.0) receiver/hostmetrics/process complains about "invalid pid 0" on a CNF
environment. My lack of knowledge on GO does not allow me to determine if this call "parentPid, err := parentPid(handle, pid)"
block metrics exposure "per process" as documentation set out. Even if we set the parameter "mute_process_name_error" at
true.

Does the receiver/hostmetrics/process expose really metrics "per process"? I only can see

# HELP process_memory_physical_usage The amount of physical memory in use.
# TYPE process_memory_physical_usage gauge
process_memory_physical_usage 5.197824e+07
# HELP process_memory_virtual_usage Virtual memory size.
# TYPE process_memory_virtual_usage gauge
process_memory_virtual_usage 8.02689024e+08

Steps to Reproduce

docker run -v $(pwd)/config.yaml:/etc/otelcol/config.yaml otel/opentelemetry-collector:0.60.0

Expected Result

No Error output

Actual Result

        error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading username for process \"otelcol\" (pid 1): open /etc/passwd: no such file or directory; error reading parent pid for process \"otelcol\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:172

Collector version

v0.60.0

OpenTelemetry Collector configuration

receivers:
  hostmetrics:
    collection_interval: 5s
    scrapers:
      filesystem:
      network:
      processes:
      process:
        mute_process_name_error: true
exporters:
  prometheus:
    endpoint: 0.0.0.0:5656
    resource_to_telemetry_conversion:
      enabled: false
service:
  pipelines:
    metrics:
      receivers:
      - hostmetrics
      exporters:
      - prometheus
  telemetry:
    logs:
      level: WARN
    metrics:
      level: detailed

Log output

        error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading username for process \"otelcol\" (pid 1): open /etc/passwd: no such file or directory; error reading parent pid for process \"otelcol\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:172

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2022-09-20T16:02:31Z

Pinging code owners: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

MarioAlexis · 2022-09-29T17:33:19Z

Hi

The fact that metrics per processes are missing when using receiver/hostmetrics/process, are we talking about a possible regression on that feature ?

Here an example of metrics that we are currently scraping.

process_memory_virtual_usage{k8s_pod_name="otel-server-6588f888c4-kgkjj",k8s_pod_uid="2551c638-13d6-4e2a-b70d-e86037f2bd57",orchestrator_namespace="platform-dev",process_command="/usr/bin/otelcol",process_command_line="/usr/bin/otelcol --config /opt/myapp/myapp-telemetry/etc/myapp-telemetry-collector-config.yaml",process_executable_name="otelcol",process_executable_path="/usr/bin/otelcontribcol",process_owner="root",process_pid="43"} 1.262133248e+09

process_memory_physical_usage{k8s_pod_name="otel-server-6588f888c4-kgkjj",k8s_pod_uid="2551c638-13d6-4e2a-b70d-e86037f2bd57",orchestrator_namespace="platform-dev",process_command="/usr/bin/otelcol",process_command_line="/usr/bin/otelcol --config /opt/myapp/myapp-telemetry/etc/myapp-telemetry-collector-config.yaml",process_executable_name="otelcol",process_executable_path="/usr/bin/otelcontribcol",process_owner="root",process_pid="43"} 3.43281664e+08

process_cpu_time{k8s_pod_name="otel-server-6588f888c4-kgkjj",k8s_pod_uid="2551c638-13d6-4e2a-b70d-e86037f2bd57",orchestrator_namespace="platform-dev",process_command="/usr/bin/otelcol",process_command_line="/usr/bin/otelcol --config /opt/myapp/myapp-telemetry/etc/myapp-telemetry-collector-config.yaml",process_executable_name="otelcol",process_executable_path="/usr/bin/otelcontribcol",process_owner="root",process_pid="43",state="system"} 566.98

process_cpu_time{k8s_pod_name="otel-server-6588f888c4-kgkjj",k8s_pod_uid="2551c638-13d6-4e2a-b70d-e86037f2bd57",orchestrator_namespace="platform-dev",process_command="/usr/bin/otelcol",process_command_line="/usr/bin/otelcol --config /opt/myapp/myapp-telemetry/etc/myapp-telemetry-collector-config.yaml",process_executable_name="otelcol",process_executable_path="/usr/bin/otelcontribcol",process_owner="root",process_pid="43",state="user"} 834.15

This is not possible anymore on recent versions.
The above metrics are coming from Otel-collector-contrib version 6a1c247 or 0.42.0

$> /usr/bin/otelcontribcol --version
otelcontribcol version 6a1c247

Thank you for your time.

github-actions · 2022-11-29T03:35:38Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax · 2022-11-29T22:12:12Z

Does the receiver/hostmetrics/process expose really metrics "per process"? I only can see

Yes, it does. The process scraper collects metrics per process running on the host, but, since you run the collector in a container, you have only one process.

@MarioAlexis I'm trying to see what the problem is, and why you don't see metrics anymore. I see that you report different errors:

error reading parent pid for process in the issue title
error reading username for process in the log output snippets.

Any of them should not cause missing metrics, only resource attributes. Do you have any other reported errors by a chance?

MarioAlexis · 2022-11-30T18:14:27Z

Hi @dmitryax

You're right. I must have discarded a portion of the output while formating the text. I have updated the description of the issue with, this time, the good output. Sorry about that.

I will update the issue title by adding error reading username for process, since there's 2 errors on a single output.

This is still reproducible on version 0.66.0

Any of them should not cause missing metrics

I was thinking that maybe the error prompt by otelcol-contrib was causing the missing hostmetrics/process metrics.
My real problem on why I was not able to get metrics per process it's because of this param inside prometheus exporter:

    resource_to_telemetry_conversion:
      enabled: false

Setting that to true I was able to see all metrics for each process.

But otelcol-contrib still logging this error

        error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "error reading username for process \"otelcol\" (pid 1): open /etc/passwd: no such file or directory; error reading parent pid for process \"otelcol\" (pid 1): invalid pid 0", "scraper": "process"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
        go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:172

Even by having mute_process_name_error param set to true

dmitryax · 2022-11-30T23:46:48Z

The logged error is expected because you run the collector in a container that has only one process without any parent process or username. This is not a typical usage of this scraper, this scraper is expected to be run on a Linux host collecting metrics from all other processes.

What's your use case? Do you need to get CPU/memory metrics of the collector? In that case, you can use hostmetrics memory and cpu scrapers since your host = one collector process container. Also, the collector exposes its own metrics that also include cpu/memory AFAIRm that can be configured under service::telemetry.

MarioAlexis · 2022-12-01T15:50:10Z

Thank you to clarify the purpose of this scraper.
We are currently using Otelcol-contrib on a container(k8s environement) to scrape static endpoints and exposing local metrics per process.

This is not a typical usage of this scraper, this scraper is expected to be run on a Linux host collecting metrics from all other processes.

Is there another scrapper more suitable for container environment that will expose metrics per process ?
Even if hostmetrics/process is expected to run on a Linux host, it's actually working pretty fine on a container.
I was wondering if the mute_process_name_error param should also mute that kind of error.

We don't want Otelcol to log an error when isn't really an issue. Could be a warning log.

dmitryax · 2022-12-05T04:53:15Z

Is there another scrapper more suitable for container environment that will expose metrics per process ?

If you want to scrape metrics about other processes on an underlying host, you can configure it to fetch host metrics from the otel container as described in https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#collecting-host-metrics-from-inside-a-container-linux-only. Or you can use other k8s specific receivers e.g. https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kubeletstatsreceiver

MarioAlexis · 2022-12-07T15:00:38Z

Hi
Thank you for the answer.

I'm more interest to gather metrics per process inside a container which have multiple process running(including otelcol) without having this "error reading username / reading parent pid (pid1)" error rather than look for host metrics.

Is that something feasable? Is there any other receiver that is more suitable for that?
Look like hostmetrics is our best candidate for now.

I appreciate your time on this. Thank you.

alexchowle · 2022-12-20T14:49:59Z

This is happening in 0.60.0, too. Nothing special with the config and is just the collector running as native process. I get lots of "error reading process name for pid #: readlink /proc/#/exe: no such file or directory" (where # is a given low numbered PID).

alexchowle · 2022-12-20T14:55:40Z

Now, having added "mute_process_name_error: true" to the config I have "... unknown userid xxxxxx" messages instead, where xxxxxx is a number

alexchowle · 2022-12-20T15:40:48Z

Heh. I've traced this all the way through the dependency tree. The underlying "os/user" package is failing a UID lookup by returning an error if the user does not exist in "/etc/passwd" file.

Should all process scrapes fail because a UID can't be resolved?

Raised as separate issue #17187

github-actions · 2023-02-20T03:32:54Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2023-05-26T21:59:48Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

…mute `error reading username for process` (#28661) **Description:** add configuration option `mute_process_user_error`) to mute `error reading username for process` **Link to tracking Issue:** * #14311 * #17187 Signed-off-by: Dominik Rosiek <[email protected]>

…mute `error reading username for process` (open-telemetry#28661) **Description:** add configuration option `mute_process_user_error`) to mute `error reading username for process` **Link to tracking Issue:** * open-telemetry#14311 * open-telemetry#17187 Signed-off-by: Dominik Rosiek <[email protected]>

MarioAlexis added bug Something isn't working needs triage New item requiring triage labels Sep 19, 2022

codeboten added the receiver/hostmetrics label Sep 20, 2022

evan-bradley added priority:p2 Medium and removed needs triage New item requiring triage labels Sep 26, 2022

github-actions bot added the Stale label Nov 29, 2022

dmitryax removed the Stale label Nov 29, 2022

MarioAlexis changed the title ~~[receiver/hostmetrics/process] "error reading parent pid for process ... (pid 1): invalid pid 0"~~ [receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" Nov 30, 2022

8naama mentioned this issue Jan 19, 2023

Feature request: Add process metrics receiver MacOS #17863

Closed

github-actions bot added the Stale label Feb 20, 2023

atoulme mentioned this issue Mar 18, 2023

Incorrect Errors for Parent Processes That No Longer Exist #19753

Closed

github-actions bot added the closed as inactive label May 26, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2023

sumo-drosiek mentioned this issue Oct 27, 2023

[receiver/hostmetrics/scrapers/process]: add configuration option to mute error reading username for process #28661

Merged

github-actions bot mentioned this issue Jul 1, 2024

Link Checker Report signalfx/splunk-otel-collector#5039

Closed

github-actions bot mentioned this issue Nov 11, 2024

Link Checker Report signalfx/splunk-otel-collector#5593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" #14311

[receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" #14311

MarioAlexis commented Sep 19, 2022 •

edited

Loading

github-actions bot commented Sep 20, 2022

MarioAlexis commented Sep 29, 2022

github-actions bot commented Nov 29, 2022

dmitryax commented Nov 29, 2022

MarioAlexis commented Nov 30, 2022

dmitryax commented Nov 30, 2022

MarioAlexis commented Dec 1, 2022

dmitryax commented Dec 5, 2022

MarioAlexis commented Dec 7, 2022

alexchowle commented Dec 20, 2022

alexchowle commented Dec 20, 2022

alexchowle commented Dec 20, 2022 •

edited

Loading

github-actions bot commented Feb 20, 2023

github-actions bot commented May 26, 2023

[receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" #14311

[receiver/hostmetrics/process] "error reading username for process ... error reading parent pid for process ... (pid 1): invalid pid 0" #14311

Comments

MarioAlexis commented Sep 19, 2022 • edited Loading

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Sep 20, 2022

MarioAlexis commented Sep 29, 2022

github-actions bot commented Nov 29, 2022

dmitryax commented Nov 29, 2022

MarioAlexis commented Nov 30, 2022

dmitryax commented Nov 30, 2022

MarioAlexis commented Dec 1, 2022

dmitryax commented Dec 5, 2022

MarioAlexis commented Dec 7, 2022

alexchowle commented Dec 20, 2022

alexchowle commented Dec 20, 2022

alexchowle commented Dec 20, 2022 • edited Loading

github-actions bot commented Feb 20, 2023

github-actions bot commented May 26, 2023

MarioAlexis commented Sep 19, 2022 •

edited

Loading

alexchowle commented Dec 20, 2022 •

edited

Loading