layer_shell: keep output non-NULL wherever possible #6820

tchebb · 2022-02-06T20:45:27Z

NOTE: Please review this PR carefully, as I am not confident enough in my understanding of the lifecycle of the objects in question to be sure that I'm not introducing some other bug (crash, memory leak, etc) with this change.

Our layer shell implementation assigns every layer surface to an output on creation. It tracks this output using the output field on the underlying wlr_layer_surface_v1 structure. As such, much of the existing code assumes that output is always non-NULL and omits NULL checks accordingly.

However, there are currently two cases where we destroy a sway_layer_surface and output is NULL. The first is when we can't find an output to assign the surface to and destroy it immediately after creation. The second is when we destroy a surface in response to its output getting destroyed, as we set output to NULL in handle_output_destroy() before we call wlr_layer_surface_v1_destroy(), which is what calls the appropriate unmap and destroy callbacks.

The former case doesn't cause any problems, since we haven't even allocated a sway_layer_surface at that point or registered any callbacks. The latter case, however, currently triggers a crash (#6120) if a popup is visible, since our popup_handle_unmap() implementation can't handle a NULL output.

To fix this issue, keep output set until right before we free the sway_layer_surface. All we need to do is remove some of the cleanup logic from handle_output_destroy(), since as of commit c9060bc ("layer-shell: replace close() with destroy()") that same logic is guaranteed to be happen later when wlroots calls handle_destroy() as part of wlr_layer_surface_v1_destroy().

This lets us remove some NULL checks from other unmap/destroy callbacks, which is nice. We also don't need to check that the wlr_output points to a valid sway_output anymore, since we unset that pointer after disabling the output as of commit a0bbe67 ("Address emersions comments on output re-enabling") Just to be safe, I've added assertions that the wlr_output is non-NULL wherever we use it.

Fixes #6120.

Our layer shell implementation assigns every layer surface to an output on creation. It tracks this output using the output field on the underlying wlr_layer_surface_v1 structure. As such, much of the existing code assumes that output is always non-NULL and omits NULL checks accordingly. However, there are currently two cases where we destroy a sway_layer_surface and output is NULL. The first is when we can't find an output to assign the surface to and destroy it immediately after creation. The second is when we destroy a surface in response to its output getting destroyed, as we set output to NULL in handle_output_destroy() before we call wlr_layer_surface_v1_destroy(), which is what calls the appropriate unmap and destroy callbacks. The former case doesn't cause any problems, since we haven't even allocated a sway_layer_surface at that point or registered any callbacks. The latter case, however, currently triggers a crash (swaywm#6120) if a popup is visible, since our popup_handle_unmap() implementation can't handle a NULL output. To fix this issue, keep output set until right before we free the sway_layer_surface. All we need to do is remove some of the cleanup logic from handle_output_destroy(), since as of commit c9060bc ("layer-shell: replace close() with destroy()") that same logic is guaranteed to be happen later when wlroots calls handle_destroy() as part of wlr_layer_surface_v1_destroy(). This lets us remove some NULL checks from other unmap/destroy callbacks, which is nice. We also don't need to check that the wlr_output points to a valid sway_output anymore, since we unset that pointer after disabling the output as of commit a0bbe67 ("Address emersions comments on output re-enabling") Just to be safe, I've added assertions that the wlr_output is non-NULL wherever we use it. Fixes swaywm#6120.

tchebb · 2022-02-06T21:15:57Z

Updated to also remove NULL checks in handle_destroy(), since I don't think those are needed either after this change.

Nefsen402 · 2022-02-11T23:00:49Z

FYI I am currently running the effort of doing the migration to wlr_scene #6562. My branch rewrites this code in question. I haven't look at the code too closely but it seems to me like this patch fixes a NULL deference but introduces a use-after-free/assert. If an output got disabled while a popup was active, the sway_output is now freed, and since we're no longer nulling out the sway_output on line 286 things will start trying to read/write from garbage memory potentially if a transaction happens in the meantime. On the flip side, if a popup happens while all outputs are disabled, then handle_layer_shell_surface won't assign a output and NULL will start pinging off the walls anyway.

tchebb · 2022-02-12T00:11:24Z

in the meantime

Can you elaborate on exactly which "meantime" you mean here? It's quite possible that this code introduces a potential use-after-free; like I noted, I'm not totally confident in my understanding of the lifecycle here.

My current understanding is that this change is safe because we still NULL out the pointer and remove the appropriate list items later on, in handle_destroy() (see line 373). I think handle_destroy() runs synchronously shortly after handle_output_destroy() because of the latter's call to wlr_layer_surface_v1_destroy(), which signals our sway_layer's destroy callback. Please correct me if I'm wrong, though.

tchebb · 2022-02-12T00:13:50Z

if a popup happens while all outputs are disabled, then handle_layer_shell_surface won't assign a output and NULL will start pinging off the walls anyway

I did consider this case as well, and my commit message addresses it:

The former case doesn't cause any problems, since we haven't even allocated a sway_layer_surface at that point or registered any callbacks.

Edit: actually, what I considered was when an entirely new layer appears while all outputs are disabled, not when a popup of an existing layer appears. But I don't think the latter case can ever happen, since we destroy a layer as soon as its output goes away. Is there any time in which it would have a chance to make a popup after the output goes away but before it's destroyed?

Nefsen402 · 2022-02-12T00:28:04Z

Can you elaborate on exactly which "meantime" you mean here? It's quite possible that this code introduces a potential use-after-free; like I noted, I'm not totally confident in my understanding of the lifecycle here.

Thinking about this more, I don't think this is possible because wl_signals get called synchronously. This is what I was thinking:
Transactions in sway get committed when all the surfaces ack their configure events. When an output disappears, then sway will rearrange all the windows sending configure events. When all the surfaces acked the reconfigures, then that's when the old sway_output will get freed (along with friends). It might be possible for the sway_output to get freed before the layer_shell gets notified of the output destroy - but isn't actually.

actually, what I considered was when an entirely new layer appears while all outputs are disabled, not when a popup of an existing layer appears. But I don't think the latter case can ever happen, since we destroy a layer as soon as its output goes away. Is there any time in which it would have a chance to make a popup after the output goes away but before it's destroyed?

Right, nevermind. This case isn't possible.

This LGTM.

tchebb · 2022-04-12T17:55:01Z

Since #6844 is still under development, can we merge this in the meantime? Also nice to have a more correct version of the old code in the Git history, even once the rewrite replaces it. cc @emersion

emersion

Yup, this LGTM!

MayeulC · 2022-06-08T13:56:27Z

Thanks a lot for this fix!

I'm wondering when it will land in a release? I just hit that issue (#6855 to be precise, I disconnected a laptop from the docking station without opening it), and it seems to be hit quite frequently, going by the comments on various issues.

What are the release plans for the next version? If the next release is far away, would you mind doing a point release with just this change (and possibly other fixes)?

Thank you again for your work!

tchebb force-pushed the fix-6120 branch from 7d3b52b to ae0fb49 Compare February 6, 2022 20:48

tchebb force-pushed the fix-6120 branch from ae0fb49 to 4db1e52 Compare February 6, 2022 21:13

Nefsen402 mentioned this pull request Feb 21, 2022

Use wlroots scene graph api #6844

Merged

5 tasks

emersion approved these changes Apr 13, 2022

View reviewed changes

emersion merged commit d726e50 into swaywm:master Apr 13, 2022

arbrauns mentioned this pull request Apr 13, 2022

Crash when destroying output #5804

Closed

tchebb deleted the fix-6120 branch April 18, 2023 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layer_shell: keep output non-NULL wherever possible #6820

layer_shell: keep output non-NULL wherever possible #6820

tchebb commented Feb 6, 2022 •

edited

Loading

tchebb commented Feb 6, 2022 •

edited

Loading

Nefsen402 commented Feb 11, 2022

tchebb commented Feb 12, 2022 •

edited

Loading

tchebb commented Feb 12, 2022 •

edited

Loading

Nefsen402 commented Feb 12, 2022

tchebb commented Apr 12, 2022

emersion left a comment

MayeulC commented Jun 8, 2022 •

edited

Loading

layer_shell: keep output non-NULL wherever possible #6820

layer_shell: keep output non-NULL wherever possible #6820

Conversation

tchebb commented Feb 6, 2022 • edited Loading

tchebb commented Feb 6, 2022 • edited Loading

Nefsen402 commented Feb 11, 2022

tchebb commented Feb 12, 2022 • edited Loading

tchebb commented Feb 12, 2022 • edited Loading

Nefsen402 commented Feb 12, 2022

tchebb commented Apr 12, 2022

emersion left a comment

Choose a reason for hiding this comment

MayeulC commented Jun 8, 2022 • edited Loading

tchebb commented Feb 6, 2022 •

edited

Loading

tchebb commented Feb 6, 2022 •

edited

Loading

tchebb commented Feb 12, 2022 •

edited

Loading

tchebb commented Feb 12, 2022 •

edited

Loading

MayeulC commented Jun 8, 2022 •

edited

Loading