-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown when a component panics #3955
Comments
@alolita please assign this issue to me. |
@PaurushGarg @mx-psi I'm wondering if this is still being worked on. In our distribution of the collector it seems like panics are becoming more common as the number of components in contrib grows. Right now there's not a great way to gracefully handle panics from components and pass them along to the caller of the collector. |
I am not actively working on this, feel free to take it if you want. My main concern with this was graceful shutdown, re-reading this maybe we want to raise the panic again after the shutdown. Feel free to open a PR and we can discuss there |
Got it. I don't have capacity currently but I do think my team is interested in solving this issue. We may take this on in the next few weeks. |
Note that I may have misunderstood something obvious here, so please let me know if that's the case. SummaryI don't think a central Practically what this means is that to enforce graceful shutdown on panics, each component would need to implement recover functionality for each spawned goroutine. Even this may not be possible to some extent, as components may rely on a dependency that starts its own goroutines which in turn could then panic. Practical exampleI tested using the OTLP receiver ingesting traces using the GRPC protocol.
I added a panic here for testing. The following is the stack trace of different goroutines being spawned by the OTLP receiver to get to the introduced panic. Stack traces
The result of this is that even the receiver's main goroutine calling |
@crobert-1 I think the main factor on deciding here is whether the amount of panics that we would be able to catch with one or more recovers is useful/significant to justify the effort of implementing this. We won't be able to catch all panics, but we would be able to catch some. |
May have some dependency on #9324 |
I am not sure if this should be a target for service 1.0 (and therefore Collector 1.0), it's something we can do after 1.0 as an enhancement. Any thoughts? |
I would vote for not including in 1.0. |
Agreed, I removed this |
Is your feature request related to a problem? Please describe.
When a component panics the Collector does not gracefully handle the panic and abruptly stops.
Describe the solution you'd like
The Collector would
recover
from the panic, correctly log it and gracefully shut down. This could be done with a catch-allrecover
call on theservice.Collector
code or at the level of components helpers which wouldrecover
and report a fatal error.Describe alternatives you've considered
Continue as is without graceful shutdown.
The text was updated successfully, but these errors were encountered: