-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[service/proctelemetry] Unset HOST_PROC to make sure we use and report the actual process and introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable #7434
Conversation
… introduce an option to allow user to overwrite the unset logic and use the value set in the environment variable Signed-off-by: Dani Louca <[email protected]>
a9c5b6f
to
620b2c6
Compare
pm.proc, err = process.NewProcess(int32(os.Getpid())) | ||
if err != nil { | ||
return err | ||
} | ||
// restore immediately if it was previously set, don't use defer | ||
if !useHostProcEnvVar && exists { | ||
os.Setenv("HOST_PROC", hostproc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there's a chance of memoization or is there some guarantee from gopsutil that subsequent lookups w/ the restored env var will be reliably using the hostfs procfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this should be probably be deferred so that it always happens moving forward regardless of changes in error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just realized that subsequent stat updates are all though gopsutil so this would need to happen for each update cycle, which is almost certainly going to cause race conditions w/ the host metrics receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is a bust! gopsutil re-use the hostproc on every call :(
func (p *Process) fillFromTIDStatWithContext(ctx context.Context, tid int32) (uint64, int32, *cpu.TimesStat, int64, uint32, int32, *PageFaultsStat, error) {
pid := p.Pid
var statPath string
if tid == -1 {
statPath = common.HostProc(strconv.Itoa(int(pid)), "stat")
} else {
statPath = common.HostProc(strconv.Itoa(int(pid)), "task", strconv.Itoa(int(tid)), "stat")
}
// unset HOST_PROC env variable so the process search occurs locally | ||
if !useHostProcEnvVar && exists { | ||
os.Unsetenv("HOST_PROC") | ||
} | ||
pm.proc, err = process.NewProcess(int32(os.Getpid())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that this pid needs to be from the host's perspective** and not gathered from the container's syscall.
|
||
// UseHostProcEnvVar when set to true, the metric server, through gopsutil, | ||
// lookup for the otel process in the proc path defined in the HOST_PROC environment variable. | ||
UseHostProcEnvVar bool `mapstructure:"useHostProcEnvVar"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to expose this? If it's unclear, I would remove this from the PR to avoid extending the public API surface
pm.proc, err = process.NewProcess(int32(os.Getpid())) | ||
if err != nil { | ||
return err | ||
} | ||
// restore immediately if it was previously set, don't use defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't use defer
and process.NewProcess
panics we won't set the environment variable back, so I don't think this approach is acceptable. What are the challenges with using defer
?
I think a better approach for this would be for gopsutil
to allow us specifying the use or not of HOST_PROC
instead of us modifying the environment variables. Have we asked the gopsutil
maintainer about this?
Thanks @rmfitzpatrick @mx-psi for reviewing.
so looks like the fix is more involved , so we either need to:
cc: @dmitryax |
closing PR and moving discussion to #7435 |
…ariable with a programmatic value (#7998) Reprising #7434 **Description:** While debugging the below error in k8s env ```` Error: failed to register process metrics: process does not exist 2023/03/23 03:44:47 main.go:115: application run finished with error: failed to register process metrics: process does not exist ```` I have noticed that the metric server is calling GOPSUTIL while the HOST_PROC variable is set , this causes gopsutil `PidExistsWithContext ` to retrieve the process from the host instead from the container ```` func PidExistsWithContext(ctx context.Context, pid int32) (bool, error) { if pid <= 0 { return false, fmt.Errorf("invalid pid %v", pid) } proc, err := os.FindProcess(int(pid)) if err != nil { return false, err } if isMount(common.HostProc()) { // if /<HOST_PROC>/proc exists and is mounted, check if /<HOST_PROC>/proc/<PID> folder exists _, err := os.Stat(common.HostProc(strconv.Itoa(int(pid)))) if os.IsNotExist(err) { return false, nil } return err == nil, err } ```` This PR unsets and resets the host_proc variable and introduces an option to allow the use of host_proc if for whatever reason they need to **Link to tracking Issue:** Fixes #7435 **Testing:** unit tests --------- Signed-off-by: Dani Louca <[email protected]> Co-authored-by: Dani Louca <[email protected]> Co-authored-by: Alex Boten <[email protected]>
Description:
While debugging the below error in k8s env
I have noticed that the metric server is calling GOPSUTIL while the HOST_PROC variable is set , this causes gopsutil
PidExistsWithContext
to retrieve the process from the host instead from the containerThis PR unsets and resets the host_proc variable and introduces an option to allow the use of host_proc if for whatever reason they need to
Link to tracking Issue:
Testing: < Describe what testing was performed and which tests were added.>
unit tests
Documentation: < Describe the documentation added.>
Please delete paragraphs that you did not use before submitting.