-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup deadlock when component uses host.ReportFatalError()
#8116
Comments
@mwear This is related to your current work. |
|
I think the same underlying issue still exists. I can look into this more and either fix or close this when I have more details. Feel free to assign me. |
Why does a component need to report fatal status during startup via a separate API when it can already return |
If a component wishes to return a fatal error from start, and it can do so synchronously, it should just return an error from start. There are a number of components that start async work in go routines and if they want to report a fatal error, they need a separate API. We should audit components for non-async fatals. Now that we have component status reporting, and the notion of a permanent error, we can consider if some fatals should be permanent. The difference is that a permanent error will allow a collector to continue running in a degraded mode, but will require human intervention to fix. |
I filed #9324 as a related issue - it's not clear to me when it's admissible to return an error, and what type of error, in the Start function. |
Describe the bug
If a component tries to call
host.ReportFatalError()
during startup, the caller will block on the asynchronous error channel.The asynchronous error channel does not have a consumer at this point, because Start() hasn't finished and we're not running.
So the start is blocked, but many of the components are started and running by now and logs are being written which makes it easy to miss the fact that start never finished. The signal handlers have not been installed, adds additional complication to the user who is trying to understand this scenario.
The documentation says not to use ReportFatalError() in Start(), so this is somewhat mitigated by telling the user to be careful.
Steps to reproduce
Use
host.ReportFatalError()
during start.What did you expect to see?
The collector would fail during startup.
What did you see instead?
As described above, a deadlock happens.
What version did you use?
v0.80.0
What config did you use?
Won't paste it, since we understand the cause.
The offending component's incorrect use:
https://github.com/lightstep/telemetry-generator/blob/e86897152cf38767d2538da629858e7680f12894/generatorreceiver/generator_receiver.go#L50
The text was updated successfully, but these errors were encountered: