Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wayland: Only call wl_egl_window_resize in SwapWindow #4821

Closed
wants to merge 2 commits into from

Conversation

sulix
Copy link
Contributor

@sulix sulix commented Oct 10, 2021

This is a slightly hacky change to get multithreaded SDL applications working under nVidia's wayland implementation. It feels vaguely related to #4563 — ultimately it's about deferring resizes until there's a new frame.

On nVidia's EGLStreams wayland implementation (at least under KDE/KWin), some applications will hang on window creation or resize. This seems to be the result of wl_egl_window_resize() being called in the middle of a frame (or possibly from another thread).

Instead, only apply the new window size immediately after SwapBuffers, which seems to work better. There's probably more to be done to avoid any mismatches in window size everywhere, but with this change, Impostor Factory now works consistently under nVidia/Wayland, and Transistor at least starts, before hitting its usual bugs.

Ultimately, there's probably a better solution here which defers more of the resize to the correct point in SwapWindow, à la 785dfdf — that patch on its own doesn't fix anything on nVidia, however. I'm not sure how Vulkan will work, either: this at least doesn't break Vulkan on Mesa, though.

@flibitijibibo
Copy link
Collaborator

Huh, isn't this basically the old hack we had pre-2.0.16? I mean, I wouldn't object to bringing it back if that's really how it's supposed to work, just want to make sure we're not rolling back behavior by accident.

@dos1 is probably the best reviewer for this one, might help fix the Weston error.

@flibitijibibo flibitijibibo self-assigned this Oct 11, 2021
@flibitijibibo flibitijibibo added this to the 2.0.18 milestone Oct 11, 2021
@dos1
Copy link
Contributor

dos1 commented Oct 11, 2021

Bringing back the old behavior will surely fix #4563. I know nothing about Nvidia implementation and can't test there though.

This commit isn't right though, as you still reconfigure surface properties using new size before actual resize happens. You would rather want to actually bring back the old code that got reverted in 871c111 (and xdg_surface_set_window_geometry calls should then be removed, as they basically workaround crashes caused by #4563) - but then we need to evaluate this against #4326, which was the motivation behind changing the old code.

On nVidia's EGLStreams wayland implementation (at least under KDE/KWin),
some applications will hang on window creation or resize. This seems to
be the result of wl_egl_window_resize() being called in the middle of a
frame (or possibly from another thread).

Instead, only apply the new window size immediately after SwapBuffers,
which seems to work better. There's probably more to be done to avoid
any mismatches in window size everywhere, but with this change, Impostor
Factory now works consistently under nVidia/Wayland, and Transistor at
least starts, before hitting its usual bugs.
@sulix
Copy link
Contributor Author

sulix commented Oct 11, 2021

Okay: having a quick look into the Mesa issues linked in #4326, it definitely looks like this is a bug with nVidia's egl implementation, and wl_egl_window_resize should be okay to call whenever.

So, I guess the real question is whether or not it's worth working around this issue until nVidia fixes it (and the fix has rolled out to enough people). Given the state of nVidia/Wayland support, I have my doubts that anyone's relying too heavily on it at the moment. It also looks like nVidia are going to be dropping a driver which re-does most of their egl implementation to use wl_drm imminently: NVIDIA/egl-wayland#40 (comment) — I don't know if that'll fix this, though.

In the meantime, I've updated the patch here a bit to defer a bit more of the resizing in a few more cases, which is enough to get Life is Strange running under nVidia/Wayland. It seemed to use SDL_SetWindowSize(), which was still calling wl_egl_window_resize directly. It's still likely missing quite a few cases, and I haven't tested it with libdecor: who knows how broken that'd be. Alas, enough of the code was cleaned up after 871c111 that reverting it isn't quite as simple as would be ideal.

@dos1
Copy link
Contributor

dos1 commented Oct 11, 2021

Your new commit has exact same changes as the old one.

@flibitijibibo
Copy link
Collaborator

In any case we should report this to NV so it makes its way into the 470 series. It sounds like we have real world games they can test against so this should be something we can file.

@flibitijibibo
Copy link
Collaborator

Tested this locally, honestly if it fixes #4563 I don't think I'd mind this... the issue in #4326 is actually still fixed here because the issue is that we weren't calling the whole function for HandlePendingResize, while this just defers wl_egl_window_resize, so the timing is still okay here. Sadly still no luck for #4571 which is expected (and between all the issues, it's this one I care about most, the rest is just the equivalent of -Wpedantic in comparison).

@dos1
Copy link
Contributor

dos1 commented Oct 11, 2021

the issue is that we weren't calling the whole function for HandlePendingResize, while this just defers wl_egl_window_resize, so the timing is still okay here

We have to defer most of that function though, not just wl_egl_window_resize, as not doing so will likely cause other issues. I guess it may be worth trying deferring everything from there except SDL_SendWindowEvent.

@flibitijibibo
Copy link
Collaborator

I guess it may be worth trying deferring everything from there except SDL_SendWindowEvent.

Agreed... anyone want to give this a shot? I'll be caught up in a Switch thing today so it'll be a minute before I can try moving this stuff around.

In addition to deferring the wl_egl_window_resize() call for OpenGL(ES)
windows, defer all other wayland calls.

The window message is still sent, and the window data (width, height,
scale factor) are set immediately.

This doesn't seem to break anything I've tried, but the early 'return'
calls are a little ugly.
@sulix
Copy link
Contributor Author

sulix commented Oct 12, 2021

Agreed... anyone want to give this a shot? I'll be caught up in a Switch thing today so it'll be a minute before I can try moving this stuff around.

I did actually try out a version which did this, so I've pushed that (I thought I'd pushed it last night, but apparently not…). It's a little ugly (it just returns out of the functions early, and adds a new copy of the resize code, rather than nicely refactoring it), and it updates the window data immediately (and the scale factor), but it hasn't obviously caused any regressions I've found.

@erik-kz
Copy link

erik-kz commented Oct 14, 2021

Hello from NVIDIA, if it helps clear things up, this is indeed a known bug in our EGL Wayland code that we're tracking internally. Specifically, calling wl_egl_window_resize from a different thread than the one to which the Window's EGLSurface is current doesn't work properly. You can see here https://github.com/NVIDIA/egl-wayland/blob/ce4c9635fb3121ef59a82eace1a29125d21b798b/src/wayland-eglsurface.c#L1559 that we basically destroy the old EGLSurface, create a new one, and then call eglMakeCurrent from whatever thread did the resize. This isn't specific to KDE, it would affect any compositor.

@flibitijibibo
Copy link
Collaborator

Checking on this as we start triaging toward 2.0.18... where are we with this? Do we want to mark this as NOTOURBUG or do we want to try and get this in for #4563?

@flibitijibibo
Copy link
Collaborator

Going to bump this to 2.0.20 - as David said there's probably a better way to do this, and part of this may end up not being too critical as NVIDIA continues pushing forward on their Wayland support. With both things in mind it feels like we can just reconvene on this after NV's New Feature Branch stabilizes, which buys #4563 more time.

@flibitijibibo flibitijibibo modified the milestones: 2.0.18, 2.0.20 Nov 15, 2021
@flibitijibibo flibitijibibo added the notourbug This bug is caused by an external component label Jan 13, 2022
@flibitijibibo
Copy link
Collaborator

flibitijibibo commented Feb 19, 2022

Meant to ask this after 510 but completely forgot: What does this bug look like with the new GBM support?

@sulix
Copy link
Contributor Author

sulix commented Feb 21, 2022

I tried the GBM backend today, and this issue still seems to be present: Life is Strange hangs with a black screen on startup if forced to use Wayland.

Note that my nVidia machine isn't quite running bleeding-edge-enough stuff for nVidia/GBM to be stable enough for everyday use for me, so my testing of it is a bit sporadic and may be affected by other issues, particularly what appear to be the various linked Qt/KDE issues here:
NVIDIA/egl-wayland#40

Note that, in general, there don't appear to have been any fixes for this bug in that egl-wayland repository, so I doubt this is supposed to be fixed by the GBM/EGLstreams change.

@flibitijibibo
Copy link
Collaborator

This is at risk of being bumped to .24 - would be great to know if there's an upcoming driver for this but at the same time reintroducing the deferred resize may be a part of #4563 if that doesn't get bumped as well.

@flibitijibibo flibitijibibo modified the milestones: 2.0.22, 2.0.24 Mar 30, 2022
sulix added a commit to sulix/egl-wayland that referenced this pull request Apr 4, 2022
Applications which call wl_egl_window_resize() in the middle of a frame
from another thread will hang, as MakeCurrent() is called from the
wrong thread.

The obvious fix for this is to delay the resize until the next
eglSwapBuffers() call. An SDL-side implementation of this exists:
libsdl-org/SDL#4821
But since some things should update immediately on resize (even if the
attached buffer doesn't), it seems better to have this in egl.

This patch fixes games such as:
- Imposter Factory:
https://store.steampowered.com/app/1182620/Impostor_Factory/
- Life is Strange:
https://store.steampowered.com/app/319630/Life_is_Strange__Episode_1/

Both would hang on startup if run against a wayland-enabled build of
SDL, but work perfectly with this patch.

Signed-off-by: David Gow <[email protected]>
@sulix
Copy link
Contributor Author

sulix commented Apr 4, 2022

I finally got around to fixing Wayland on my nVidia machine (the nVidia/Wayland/KDE is seriously broken on Ubuntu/Debian at the moment, so I've switched this machine to openSUSE), and can confirm the issue is still present.

I've filed NVIDIA/egl-wayland#52 with the egl-wayland project (as while there's apparently an internal issue tracked, it'd be nice to have something public to point to), as well as porting a similar defer-all-resizes workaround here:
NVIDIA/egl-wayland#53

Given that Ubuntu is planning to make nVidia/Wayland the default in 22.04 LTS — which seems excessively brave to me — there's going to need to be either a fix or a workaround for this somewhere: at this point we're largely saved by none of the affected games having Wayland enabled on their SDL builds, but that won't last…

@flibitijibibo
Copy link
Collaborator

Having this in NV egl-wayland seems like the right thing to do, since other hardware isn't affected AFAICT. Introducing the workaround on our end just introduces a regression for everybody, so I'm not sure we want this patch at this point...?

(It looks like all distributions are going to be defaulting to Wayland on NV hardware in spite of things like this and some of the transparency work that hasn't been released yet, so I wonder if something's coming up soon and it's not public info yet.)

sulix added a commit to sulix/egl-wayland that referenced this pull request Apr 5, 2022
Applications which call wl_egl_window_resize() in the middle of a frame
from another thread will hang, as MakeCurrent() is called from the
wrong thread.

The obvious fix for this is to delay the resize until the next
eglSwapBuffers() call. An SDL-side implementation of this exists:
libsdl-org/SDL#4821
But since some things should update immediately on resize (even if the
attached buffer doesn't), it seems better to have this in egl.

This patch fixes games such as:
- Imposter Factory:
https://store.steampowered.com/app/1182620/Impostor_Factory/
- Life is Strange:
https://store.steampowered.com/app/319630/Life_is_Strange__Episode_1/

Both would hang on startup if run against a wayland-enabled build of
SDL, but work perfectly with this patch.

Signed-off-by: David Gow <[email protected]>
sulix added a commit to sulix/egl-wayland that referenced this pull request Apr 5, 2022
Applications which call wl_egl_window_resize() in the middle of a frame
from another thread will hang, as MakeCurrent() is called from the
wrong thread.

The obvious fix for this is to delay the resize until the next
eglSwapBuffers() call. An SDL-side implementation of this exists:
libsdl-org/SDL#4821
But since some things should update immediately on resize (even if the
attached buffer doesn't), it seems better to have this in egl.

This patch fixes games such as:
- Imposter Factory:
https://store.steampowered.com/app/1182620/Impostor_Factory/
- Life is Strange:
https://store.steampowered.com/app/319630/Life_is_Strange__Episode_1/

Both would hang on startup if run against a wayland-enabled build of
SDL, but work perfectly with this patch.

Signed-off-by: David Gow <[email protected]>
sulix added a commit to sulix/egl-wayland that referenced this pull request Apr 28, 2022
Applications which call wl_egl_window_resize() in the middle of a frame
from another thread will hang, as MakeCurrent() is called from the
wrong thread.

The obvious fix for this is to delay the resize until the next
eglSwapBuffers() call. An SDL-side implementation of this exists:
libsdl-org/SDL#4821
But since some things should update immediately on resize (even if the
attached buffer doesn't), it seems better to have this in egl.

This patch fixes games such as:
- Imposter Factory:
https://store.steampowered.com/app/1182620/Impostor_Factory/
- Life is Strange:
https://store.steampowered.com/app/319630/Life_is_Strange__Episode_1/

Both would hang on startup if run against a wayland-enabled build of
SDL, but work perfectly with this patch.

Signed-off-by: David Gow <[email protected]>
@flibitijibibo
Copy link
Collaborator

Closing in favor of NVIDIA/egl-wayland#53, which was just approved on NV's end.

erik-kz pushed a commit to NVIDIA/egl-wayland that referenced this pull request May 17, 2022
Applications which call wl_egl_window_resize() in the middle of a frame
from another thread will hang, as MakeCurrent() is called from the
wrong thread.

The obvious fix for this is to delay the resize until the next
eglSwapBuffers() call. An SDL-side implementation of this exists:
libsdl-org/SDL#4821
But since some things should update immediately on resize (even if the
attached buffer doesn't), it seems better to have this in egl.

This patch fixes games such as:
- Imposter Factory:
https://store.steampowered.com/app/1182620/Impostor_Factory/
- Life is Strange:
https://store.steampowered.com/app/319630/Life_is_Strange__Episode_1/

Both would hang on startup if run against a wayland-enabled build of
SDL, but work perfectly with this patch.

Signed-off-by: David Gow <[email protected]>
MartinPulec added a commit to MartinPulec/UltraGrid that referenced this pull request May 19, 2023
In Wayland, when resizing windows programmatically glfwSetWindowSize(),
if swap buffers was not triggered, event returning back to the original
resolution is triggered when swap buffer is triggered just after
drawing. It looks like that no drawing should occur before window size
change and swap buffers, seems there has been reports for SDL, eg. [1].

Also do not call gl_resize() after glfw_resize_window() (needless, it
will be triggered via callback after swapping the buffers).

To reproduce the wrong behavior - use Wayland with GLFW native Wayland
build, then: `uv -t testcard -d gl` (size may be any other than the
splashscreen size 512x512).

[1] libsdl-org/SDL#4821
MartinPulec added a commit to CESNET/UltraGrid that referenced this pull request May 19, 2023
In Wayland, when resizing windows programmatically glfwSetWindowSize(),
if swap buffers was not triggered, event returning back to the original
resolution is triggered when swap buffer is triggered just after
drawing. It looks like that no drawing should occur before window size
change and swap buffers, seems there has been reports for SDL, eg. [1].

Also do not call gl_resize() after glfw_resize_window() (needless, it
will be triggered via callback after swapping the buffers).

To reproduce the wrong behavior - use Wayland with GLFW native Wayland
build, then: `uv -t testcard -d gl` (size may be any other than the
splashscreen size 512x512).

[1] libsdl-org/SDL#4821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notourbug This bug is caused by an external component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants