Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failed test][cmd/opampsupervisor] TestSupervisorStopsAgentProcessWithEmptyConfigMap constantly failed #36764

Closed
songy23 opened this issue Dec 10, 2024 · 3 comments · Fixed by #36754
Assignees
Labels
ci-cd CI, CD, testing, build issues cmd/opampsupervisor

Comments

@songy23
Copy link
Member

songy23 commented Dec 10, 2024

Component(s)

cmd/opampsupervisor

Describe the issue you're reporting

E.g. https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/12257185021/job/34195668020

    e2e_test.go:1351: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/cmd/opampsupervisor/e2e_test.go:1356
        	            				C:/Users/runneradmin/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695
        	Error:      	An error is expected but got nil.
    e2e_test.go:1351: 
        	Error Trace:	D:/a/opentelemetry-collector-contrib/opentelemetry-collector-contrib/cmd/opampsupervisor/e2e_test.go:1351
        	Error:      	Condition never satisfied
        	Test:       	TestSupervisorStopsAgentProcessWithEmptyConfigMap
@songy23 songy23 added ci-cd CI, CD, testing, build issues needs triage New item requiring triage labels Dec 10, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bacherfl
Copy link
Contributor

I think I have found the reason for that - My recent PR (#35892) introduced a change to the opamp agent to listen for status updates for the individual components, and it seems like in the shutdown function of the opamp agent there is a channel that is not properly closed. I'm already working on a fix and should have a PR ready today

@bacherfl bacherfl self-assigned this Dec 11, 2024
@bacherfl bacherfl removed the needs triage New item requiring triage label Dec 11, 2024
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
…pen-telemetry#36754)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The changes introduced in
open-telemetry#35892
seemed to have introduced some flakyness in the opampsupervisor e2e
tests, as the shutdown of the opamp agent waits for the component health
loop to end. Due to an unclosed channel within the opamp agent however,
the agent does not properly shut down, and the supervisor runs into a
timeout before ultimately sending a SIGKILL to the agent process.
Closing the channel in the Shutdown method of the opamp extension fixes
that and the agent gets shut down properly upon the reception of the
SIGINT signal

#### Link to tracking Issue:

Fixes open-telemetry#36764

#### Testing

This fixes the failing test mentioned in the issue (open-telemetry#36764)

---------

Signed-off-by: Florian Bacher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd CI, CD, testing, build issues cmd/opampsupervisor
Projects
None yet
3 participants