-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layer_shell: keep output non-NULL wherever possible #6820
Conversation
Our layer shell implementation assigns every layer surface to an output on creation. It tracks this output using the output field on the underlying wlr_layer_surface_v1 structure. As such, much of the existing code assumes that output is always non-NULL and omits NULL checks accordingly. However, there are currently two cases where we destroy a sway_layer_surface and output is NULL. The first is when we can't find an output to assign the surface to and destroy it immediately after creation. The second is when we destroy a surface in response to its output getting destroyed, as we set output to NULL in handle_output_destroy() before we call wlr_layer_surface_v1_destroy(), which is what calls the appropriate unmap and destroy callbacks. The former case doesn't cause any problems, since we haven't even allocated a sway_layer_surface at that point or registered any callbacks. The latter case, however, currently triggers a crash (swaywm#6120) if a popup is visible, since our popup_handle_unmap() implementation can't handle a NULL output. To fix this issue, keep output set until right before we free the sway_layer_surface. All we need to do is remove some of the cleanup logic from handle_output_destroy(), since as of commit c9060bc ("layer-shell: replace close() with destroy()") that same logic is guaranteed to be happen later when wlroots calls handle_destroy() as part of wlr_layer_surface_v1_destroy(). This lets us remove some NULL checks from other unmap/destroy callbacks, which is nice. We also don't need to check that the wlr_output points to a valid sway_output anymore, since we unset that pointer after disabling the output as of commit a0bbe67 ("Address emersions comments on output re-enabling") Just to be safe, I've added assertions that the wlr_output is non-NULL wherever we use it. Fixes swaywm#6120.
Updated to also remove NULL checks in |
FYI I am currently running the effort of doing the migration to wlr_scene #6562. My branch rewrites this code in question. I haven't look at the code too closely but it seems to me like this patch fixes a NULL deference but introduces a use-after-free/assert. If an output got disabled while a popup was active, the sway_output is now freed, and since we're no longer nulling out the sway_output on line 286 things will start trying to read/write from garbage memory potentially if a transaction happens in the meantime. On the flip side, if a popup happens while all outputs are disabled, then handle_layer_shell_surface won't assign a output and NULL will start pinging off the walls anyway. |
Can you elaborate on exactly which "meantime" you mean here? It's quite possible that this code introduces a potential use-after-free; like I noted, I'm not totally confident in my understanding of the lifecycle here. My current understanding is that this change is safe because we still NULL out the pointer and remove the appropriate list items later on, in |
I did consider this case as well, and my commit message addresses it:
Edit: actually, what I considered was when an entirely new layer appears while all outputs are disabled, not when a popup of an existing layer appears. But I don't think the latter case can ever happen, since we destroy a layer as soon as its output goes away. Is there any time in which it would have a chance to make a popup after the output goes away but before it's destroyed? |
Thinking about this more, I don't think this is possible because wl_signals get called synchronously. This is what I was thinking:
Right, nevermind. This case isn't possible. This LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this LGTM!
Thanks a lot for this fix! I'm wondering when it will land in a release? I just hit that issue (#6855 to be precise, I disconnected a laptop from the docking station without opening it), and it seems to be hit quite frequently, going by the comments on various issues. What are the release plans for the next version? If the next release is far away, would you mind doing a point release with just this change (and possibly other fixes)? Thank you again for your work! |
NOTE: Please review this PR carefully, as I am not confident enough in my understanding of the lifecycle of the objects in question to be sure that I'm not introducing some other bug (crash, memory leak, etc) with this change.
Our layer shell implementation assigns every layer surface to an output on creation. It tracks this output using the
output
field on the underlyingwlr_layer_surface_v1
structure. As such, much of the existing code assumes thatoutput
is always non-NULL and omits NULL checks accordingly.However, there are currently two cases where we destroy a
sway_layer_surface
andoutput
is NULL. The first is when we can't find an output to assign the surface to and destroy it immediately after creation. The second is when we destroy a surface in response to its output getting destroyed, as we set output to NULL inhandle_output_destroy()
before we callwlr_layer_surface_v1_destroy()
, which is what calls the appropriate unmap and destroy callbacks.The former case doesn't cause any problems, since we haven't even allocated a
sway_layer_surface
at that point or registered any callbacks. The latter case, however, currently triggers a crash (#6120) if a popup is visible, since ourpopup_handle_unmap()
implementation can't handle a NULL output.To fix this issue, keep output set until right before we free the sway_layer_surface. All we need to do is remove some of the cleanup logic from
handle_output_destroy()
, since as of commit c9060bc ("layer-shell: replace close() with destroy()") that same logic is guaranteed to be happen later when wlroots callshandle_destroy()
as part ofwlr_layer_surface_v1_destroy()
.This lets us remove some NULL checks from other unmap/destroy callbacks, which is nice. We also don't need to check that the
wlr_output
points to a validsway_output
anymore, since we unset that pointer after disabling the output as of commit a0bbe67 ("Address emersions comments on output re-enabling") Just to be safe, I've added assertions that the wlr_output is non-NULL wherever we use it.Fixes #6120.