Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service/proctelemetry] Unset HOST_PROC to make sure we use and report the actual process and introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable #7434

Closed

Conversation

dloucasfx
Copy link
Contributor

@dloucasfx dloucasfx commented Mar 27, 2023

Description:

While debugging the below error in k8s env

Error: failed to register process metrics: process does not exist
2023/03/23 03:44:47 main.go:115: application run finished with error: failed to register process metrics: process does not exist

I have noticed that the metric server is calling GOPSUTIL while the HOST_PROC variable is set , this causes gopsutil PidExistsWithContext to retrieve the process from the host instead from the container

func PidExistsWithContext(ctx context.Context, pid int32) (bool, error) {
	if pid <= 0 {
		return false, fmt.Errorf("invalid pid %v", pid)
	}
	proc, err := os.FindProcess(int(pid))
	if err != nil {
		return false, err
	}

	if isMount(common.HostProc()) { // if /<HOST_PROC>/proc exists and is mounted, check if /<HOST_PROC>/proc/<PID> folder exists
		_, err := os.Stat(common.HostProc(strconv.Itoa(int(pid))))
		if os.IsNotExist(err) {
			return false, nil
		}
		return err == nil, err
	}

This PR unsets and resets the host_proc variable and introduces an option to allow the use of host_proc if for whatever reason they need to

Link to tracking Issue:

Testing: < Describe what testing was performed and which tests were added.>
unit tests
Documentation: < Describe the documentation added.>

Please delete paragraphs that you did not use before submitting.

@dloucasfx dloucasfx requested review from a team and mx-psi March 27, 2023 16:24
… introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable

Signed-off-by: Dani Louca <[email protected]>
@dloucasfx dloucasfx changed the title Unset HOST_PROC to make sure we use and report the actual process and introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable [service/proctelemetry] Unset HOST_PROC to make sure we use and report the actual process and introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable Mar 27, 2023
pm.proc, err = process.NewProcess(int32(os.Getpid()))
if err != nil {
return err
}
// restore immediately if it was previously set, don't use defer
if !useHostProcEnvVar && exists {
os.Setenv("HOST_PROC", hostproc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there's a chance of memoization or is there some guarantee from gopsutil that subsequent lookups w/ the restored env var will be reliably using the hostfs procfs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this should be probably be deferred so that it always happens moving forward regardless of changes in error handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just realized that subsequent stat updates are all though gopsutil so this would need to happen for each update cycle, which is almost certainly going to cause race conditions w/ the host metrics receiver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this is a bust! gopsutil re-use the hostproc on every call :(

func (p *Process) fillFromTIDStatWithContext(ctx context.Context, tid int32) (uint64, int32, *cpu.TimesStat, int64, uint32, int32, *PageFaultsStat, error) {
	pid := p.Pid
	var statPath string

	if tid == -1 {
		statPath = common.HostProc(strconv.Itoa(int(pid)), "stat")
	} else {
		statPath = common.HostProc(strconv.Itoa(int(pid)), "task", strconv.Itoa(int(tid)), "stat")
	}

// unset HOST_PROC env variable so the process search occurs locally
if !useHostProcEnvVar && exists {
os.Unsetenv("HOST_PROC")
}
pm.proc, err = process.NewProcess(int32(os.Getpid()))
Copy link
Contributor

@rmfitzpatrick rmfitzpatrick Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that this pid needs to be from the host's perspective** and not gathered from the container's syscall.


// UseHostProcEnvVar when set to true, the metric server, through gopsutil,
// lookup for the otel process in the proc path defined in the HOST_PROC environment variable.
UseHostProcEnvVar bool `mapstructure:"useHostProcEnvVar"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to expose this? If it's unclear, I would remove this from the PR to avoid extending the public API surface

pm.proc, err = process.NewProcess(int32(os.Getpid()))
if err != nil {
return err
}
// restore immediately if it was previously set, don't use defer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use defer and process.NewProcess panics we won't set the environment variable back, so I don't think this approach is acceptable. What are the challenges with using defer?

I think a better approach for this would be for gopsutil to allow us specifying the use or not of HOST_PROC instead of us modifying the environment variables. Have we asked the gopsutil maintainer about this?

@dloucasfx
Copy link
Contributor Author

Thanks @rmfitzpatrick @mx-psi for reviewing.
I based my PR on the assumption that gopsutil will only need the host_proc to locate the files once, but after @rmfitzpatrick pointed it out, I also noticed that it reuses host_proc on every call:
example when getting CPU time:

func (p *Process) fillFromTIDStatWithContext(ctx context.Context, tid int32) (uint64, int32, *cpu.TimesStat, int64, uint32, int32, *PageFaultsStat, error) {
	pid := p.Pid
	var statPath string

	if tid == -1 {
		statPath = common.HostProc(strconv.Itoa(int(pid)), "stat")
	} else {
		statPath = common.HostProc(strconv.Itoa(int(pid)), "task", strconv.Itoa(int(tid)), "stat")
	}

so looks like the fix is more involved , so we either need to:

  1. wait for allow to pass context values to override environment variables shirou/gopsutil#1439 to be merged
  2. or implement our own package to pull the data/or use our own fork of gopsutil that includes allow to pass context values to override environment variables shirou/gopsutil#1439

cc: @dmitryax

@dloucasfx
Copy link
Contributor Author

closing PR and moving discussion to #7435

@dloucasfx dloucasfx closed this Mar 27, 2023
codeboten pushed a commit that referenced this pull request Aug 15, 2023
…ariable with a programmatic value (#7998)

Reprising
#7434

**Description:** 

While debugging the below error in k8s env
````
Error: failed to register process metrics: process does not exist
2023/03/23 03:44:47 main.go:115: application run finished with error: failed to register process metrics: process does not exist
````
I have noticed that the metric server is calling GOPSUTIL while the
HOST_PROC variable is set , this causes gopsutil `PidExistsWithContext `
to retrieve the process from the host instead from the container

````
func PidExistsWithContext(ctx context.Context, pid int32) (bool, error) {
	if pid <= 0 {
		return false, fmt.Errorf("invalid pid %v", pid)
	}
	proc, err := os.FindProcess(int(pid))
	if err != nil {
		return false, err
	}

	if isMount(common.HostProc()) { // if /<HOST_PROC>/proc exists and is mounted, check if /<HOST_PROC>/proc/<PID> folder exists
		_, err := os.Stat(common.HostProc(strconv.Itoa(int(pid))))
		if os.IsNotExist(err) {
			return false, nil
		}
		return err == nil, err
	}
````
This PR unsets and resets the host_proc variable and introduces an
option to allow the use of host_proc if for whatever reason they need to

**Link to tracking Issue:**
Fixes #7435

**Testing:**
unit tests

---------

Signed-off-by: Dani Louca <[email protected]>
Co-authored-by: Dani Louca <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants