-
-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashpad Handler Unintended Behaviors after installation of Sentry #900
Comments
Hi @JLuse, I need help understanding the issue (without access to Zendesk for context).
Why is it necessary to manually kill the
This one needs to be clarified, too: which plugin are you referring to? Is this happening in the context of Unity or Unreal? The words "plugin" and "host" do not refer to anything specific in sentry-native terminology, so this is either usage from the customer domain or from one of the downstream SDKs. CC: @kahest, @bitsandfoxes |
Hi @supervacuus, I may be able to help you with some of those specifics ...
We're using Crashpad in the context of an audio plugin, loaded within a DAW. The reason that we are killing the handler so that when the last plugin instance is removed from the DAW we also remove any process that the first instance may have spun up.
"Plugin" and "Host" here refer to a VST3 or AU audio plugin running within a host process, typically a DAW of some description. The problem seems to be that while two Crashpad systems appeared to be able to run side-by-side in this fashion, after upgrading to Sentry the plugin (which continues to use vanilla Crashpad) when pulling down its own handler it somehow (and mysteriously to someone like me who doesn't know Mach-O well) it also pulls down the Sentry handler. This then causes the further problem that once the host's Sentry Hopefully that helps? I understand that it is hardly a standard set-up, but was hoping to get some guidance on what exactly the Mach-O interaction is that causes both handlers to die. Cheers, |
Hi @willmbaker, thank you for the context, we'll discuss and follow up here! |
Hi @willmbaker, thank you for the additional information. However, before we can further discuss this internally, I need to ensure that we know exactly what is happening.
Okay, I understand. There can be multiple plugin instances loaded in the DAW process. Each instance relies on crashpad for error-reporting. The first instance started, spins up the
Here, I have trouble following. So, previously, you were running multiple (vanilla) crashpad clients (and related handlers) within the same process space? One for the host and one (or multiple) for the plugin instances? Then you switched to Sentry's Native SDK, but only for one of those clients (the one in the host)? The plugin client(s) still run(s) with vanilla crashpad? Can you elaborate on the choice to run multiple crashpad clients? Is it because the plugins can run in other DAWs, and so are deployed by default with a separate crashpad configuration, but you also develop a DAW, and now, when you run your plugins in your own DAW, you end up in this situation? Or do you require an isolation between the reports?
This one I can answer right away: we start the This is why your host process gets stuck when killing the handler. I can imagine that two or more clients starting multiple handlers could confuse this mechanism and may explain why you see the interaction when killing one handler affects the other. To boil this down:
|
That is correct.
Yup, that's it.
It is as you say, our plugin (which existed before the DAW) may run in the context of other DAWs, not just ours.
That's interesting ... this makes sense that it would get torn down and then start again, however the behaviour we're seeing is that is it stuck in an infinite loop trying to restart, but it never does. If it correctly restarted then we could easily live with the behaviour of it being torn down when the last plugin is closed, however what it actually does is continually spins up the
I think that restarting when it dies is good behaviour, but it should just restart the one time.
This was the big question for me, as my understanding of Mach-O is not so good that I can determine through what action one process is being pulled down because another, unrelated, process is killed. I would also like to emphasize the restarting behaviour ... I'm not sure if you have access to the minimal reproduction case that I sent in (I've attached a bare Git repository of it here Cheers, and thanks for the help so far!
|
Thanks for the details @willmbaker! I haven't forgotten to respond, but this was a more elaborate investigation. I will write a summary of the situation and possible solutions approaches either today or tomorrow. Thanks for your patience. |
No problem at all, thanks for your attention! :) |
There are multiple issues at play here. Let's focus on one thing first: I was able to reproduce the described behavior on macOS. The restart thread continuously tries to restart the The logs you see come directly from Crashpad, not the Native SDK. Further, I could reproduce the issue in upstream Crashpad from the This means the problem is in the implementation of the restarter thread upstream. I have not yet dived into the "why," but something has probably changed in the behavior of mach-ports in recent macOS versions. Which macOS version are you testing this on?
Since no interface allows you to shut down the restarter thread, in this configuration, you will never be able to terminate the As far as I understood, your intention isn't to kill the Your problem that terminating the In short, the crashpad client is stateful (referencing the exception port), and if we do not keep it around during the lifecycle of the Native SDK, killing the This is also the case when you start two handlers from one client (and, again, I was able to replicate this with vanilla crashpad). We didn't need to keep the client around before, but that could solve your problem.
In general, all crash-handling/signaling mechanisms register per process (this is the same on Windows with its various handlers, Linux/Unix signal handlers, and mach exception ports on macOS). The topic is more complicated when you consider multiple threads. However, these threads still share the same singular handler implementation. Whether installing multiple handlers "works" depends on the order of initialization, whether the mechanism allows the safe invocation of previously installed handlers, and whether the handlers cooperate. Most of the time, one handler will overwrite the other and then be the sole crash-reporting mechanism (on any OS). Point in case (on macOS): if you start two different crashpad handlers via respective client instances in a single process, only the second one will report a crash. Further, if you then terminate the I understand the situation in which you are using two handlers, but be aware that this might lead to hard-to-diagnose behavior (independent of whether terminating a Starting Crashpad in the plugin when loaded in another DAW can be even more problematic because your handler could overwrite the behavior of another crash handler installed by your host, which might report crashes that have nothing to do with your plugin. Still, the developer of the host could miss relevant crashes because you received them instead. Summary of the issues:
|
Great, thank you @supervacuus. So what I'm hearing is that you don't believe that it is possible to engage more than one That said, I'm willing to accept your explanation and particularly as you say it happens in vanilla Crashpad and may be a change in Mach-O behaviour ... we've been seeing it on our automation machines which I believe are Catalina at the moment. However, at point 4 there, are you saying that you do think it would be possible to change the current behaviour of the plugin pulling down the host handler? At point 1, however, am I correct that you're saying that you don't think that crash reports would be sent for both handlers? I hear what you're saying on reviewing the architecture, and will do so, as I feel like we predominately get crashes from other plugins running in the same DAW so it has been of limited utility, to a certain degree, we tend to catch the crashes in automated tests before we ship via the host's crash reporting anyway. Again, thanks for your help and attention :) |
It's possible to start two or more clients/handlers, but as described, they probably won't do what you'd expect.
We can fix this and keep the client instance during the Native SDK life cycle (with it, the send right to the exception port). Incidentally, this would prevent our
Yes, only the last client/handler pair that was instantiated on macOS will send crash reports.
Yeah, I can imagine this to be the case. Ideally, you want some form of in-process isolation, but none of the current crash-handling mechanisms (on any platform) can provide that directly (they are all process-global). Isolation can happen only as a post-processing step in the backend, depending on the meta-data available (or solely on the stack trace of the crashing thread). Our current "scope" implementation in the Native SDK doesn't support multiple execution contexts. This might not help in an architecture where multiple plugins aren't running in full (thread) isolation but, for instance, are scheduled to run from a small pool of threads. But even in a scenario where you'd have better scoping support, it would still have to be one global handler taking care of all crashes.
You're very welcome. |
Great, thanks for your help @supervacuus ... We are going to re-think our use of crash reporting in plugins after your advice. I'm happy if you close this as not a bug or similar, thank you so much for your time :) |
@willmbaker, we added the fix via #910. It will be part of the next release. |
The fix is part of the 0.7.0 release. |
Description
A customer has reported the following issues only after updating one of their products (the host) to use Sentry.
Issue 1: When the crashpad_handler process is terminated (e.g., using kill -TERM ), the client repetitively attempts to restart the handler, leading to consistent failures.
Issue 2: On teardown of the plugin's crashpad_handler instance, it inadvertently terminates the host process's crashpad_handler. This behavior is observed even when manually terminating the plugin's handler (which is confirmed to be a separate process with a distinct PID); soon after, the host's crashpad handler is also terminated.
Prior to this, when both the customer indicated the host and the plugin were directly using Crashpad for crash reporting, these issues were not observed.
Steps To Reproduce & Log output
Customer has provided two repro apps as well as accompanying logs within the support ticket:
https://sentry.zendesk.com/agent/tickets/105906
The text was updated successfully, but these errors were encountered: