Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Repeatedly reopening GUI crashes host, asserts in cairo-xcb-screen.c #334

Open
madah81pnz1 opened this issue Jan 6, 2025 · 1 comment · May be fixed by #335
Open

[Linux] Repeatedly reopening GUI crashes host, asserts in cairo-xcb-screen.c #334

madah81pnz1 opened this issue Jan 6, 2025 · 1 comment · May be fixed by #335

Comments

@madah81pnz1
Copy link

This looks to be the same as #249 but since that one was closed with a fix, I created this new issue, since there might not be the same root cause, even if it crashes at the same assert.

If I have a single plugin instance open with GUI, repeatedly clicking the UI button in Reaper will eventually lead to a crash, after about 2-3 times.
If I have two plugin instances open, both with their GUI open, it never crashes as long as at least one has their GUI open.

In my plugin, when the last GUI is closed, after the frame is deleted, I call VSTGUI::X11::RunLoop::exit() and then VSTGUI::exit().
The crash happens the next time GUI is created and opened again, as then Cairo doesn't seem to find any screens(?)
For debug builds of my plugin, it takes much longer time to trigger the crash, so I suspect there is some race condition when Cairo is uninitialized.

As a workaround, I changed so that RunLoop::exit() and VSTGUI::exit() are never called my plugin, which seems to solve the crashing problem. But then it might leak some resources instead, so not a good long term solution.

Reaper 7.29
Debian 12 Testing(Trixie)

Latest VSTGUI from git develop branch:

commit ad46cf69c1b242f799764611b82c5af4c34ea95e (HEAD -> develop, origin/develop, origin/HEAD)
Author: scheffle <[email protected]>
Date:   Fri Dec 13 16:57:22 2024 +0100

    fix autocorrection typo

Stacktrace from gdb:

reaper: ../../../src/cairo-xcb-screen.c:219: _get_screen_index: Assertion `!"reached"' failed.

Thread 1 "reaper" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
--Type <RET> for more, q to quit, c to continue without paging--c
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff798acef in __pthread_kill_internal (threadid=<optimized out>, signo=6) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7936c42 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff791f4f0 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff791f418 in __assert_fail_base
    (fmt=0x7ffff7aa3ca0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffff6e1a4df "!\"reached\"", file=file@entry=0x7ffff6e15558 "../../../src/cairo-xcb-screen.c", line=line@entry=219, function=function@entry=0x7ffff6e289d0 "_get_screen_index") at ./assert/assert.c:94
#5  0x00007ffff792f552 in __assert_fail
    (assertion=0x7ffff6e1a4df "!\"reached\"", file=0x7ffff6e15558 "../../../src/cairo-xcb-screen.c", line=219, function=0x7ffff6e289d0 "_get_screen_index")
    at ./assert/assert.c:103
#6  0x00007ffff6ddf085 in ??? () at /lib/x86_64-linux-gnu/libcairo.so.2
#7  0x00007ffff6de1803 in cairo_xcb_surface_create () at /lib/x86_64-linux-gnu/libcairo.so.2
#8  0x00007ffff31866bc in VSTGUI::X11::DrawHandler::DrawHandler (this=0x12deaa8, window=...) at /home/matte/dev/common/vstgui/vstgui/lib/platform/linux/x11frame.cpp:171
#9  VSTGUI::X11::Frame::Impl::Impl (this=0x12dea80, parent=<optimized out>, size=..., frame=0x12f51a0)
    at /home/matte/dev/common/vstgui/vstgui/lib/platform/linux/x11frame.cpp:327
#10 0x00007ffff31837cc in VSTGUI::X11::Frame::Frame
    (this=0x1c48020, frame=0x12f51a0, size=..., parent=8417173, config=<optimized out>, __in_chrg=<optimized out>, __vtt_parm=<optimized out>)
    at /home/matte/dev/common/vstgui/vstgui/lib/platform/linux/../../cpoint.h:19
#11 0x00007ffff31606da in VSTGUI::makeOwned<VSTGUI::X11::Frame, VSTGUI::IPlatformFrameCallback*&, VSTGUI::CRect const&, unsigned long&, VSTGUI::IPlatformFrameConfig*&> ()
    at /home/matte/dev/common/vstgui/vstgui/lib/platform/linux/../../vstguibase.h:468
#12 VSTGUI::LinuxFactory::createFrame
    (this=this@entry=0x12ce910, frame=frame@entry=0x12f51a0, size=..., parent=parent@entry=0x806f95, parentType=parentType@entry=VSTGUI::PlatformType::kDefaultNative, config=0x0) at /home/matte/dev/common/vstgui/vstgui/lib/platform/linux/linuxfactory.cpp:96
#13 0x00007ffff3131dda in VSTGUI::CFrame::open (this=0x12f5180, systemWin=0x806f95, systemWindowType=VSTGUI::PlatformType::kDefaultNative, config=0x0)
    at /home/matte/dev/common/vstgui/vstgui/lib/cframe.cpp:209

For reference, similar issue with Cairo:
https://cairo.cairographics.narkive.com/eXgzeIgH/concerns-regarding-xcb-backend

@madah81pnz1
Copy link
Author

I did some troubleshooting and I think I've figured out what's going on.

_cairo_xcb_connection_get() in cairo-xcb-connection.c uses a global connection cache, static cairo_list_t connections.
So if the xcb_connection pointer value happens to match an existing pointer in this cache, it will return the wrong cairo_xcb_connection. This explains why the crash doesn't happen every time, because in most cases you'll get a new pointer value for the newly allocated xcb_connection, but sometimes the same memory address gets reused.

_device_finish() in cairo-xcb-connection.c is the function that would have removed the connection from this global cache.
This happens when cairo_device_destroy() is called and the device refcount reaches 0.

I printed the refcount from CairoGraphicsDevice() and ~CairoGraphicsDevice(), using cairo_device_get_reference_count(impl->device). The value seems to never reach 0.
The first time when printed in the constructor, the value is 3 (before the call to cairo_device_reference), and in the destructor, the value is 11 (before the call to cairo_device_destroy). When I repeatedly open/close the plugin GUI, the refcount goes up and down, but seems to stay at 11 afterwards every time.

Before commit a8b5c81, ~DrawHandler() used to have a call to cairo_device_finish(). This worked as in didn't crash, but only because it would forcefully remove and cleanup the cairo device. But then there are still ~10 other places that holds a reference count to it from somewhere. This is the main bug, a resource leak.

CairoGraphicsDeviceFactory is deleted from VSTGUI::exit() via exitPlatform(). If VSTGUI::exit() is not called, the existing device would have been found and reused by the factory, which also holds a cache of devices.

A fix is then to call cairo_device_finish() on each device in the CairoGraphicsDeviceFactory::Impl destructor:

struct CairoGraphicsDeviceFactory::Impl
{
	std::vector<std::shared_ptr<CairoGraphicsDevice>> devices;
	
	Impl() = default;
	~Impl()
	{
		for (auto& device : devices)
		{
			cairo_device_finish (device->get());
		}
	}
};

Even if it is somewhat hacky fix, when the factory is deleted, the user has anyway called VSTGUI::exit(). So at that point, cairo should also be cleaned up.
The risk is that it might crash somewhere else instead, since I still don't know where those ~10 reference counts comes from.
Also I'm not sure if cairo_device_finish() actually frees the memory for the device, so there is still the resource leakage. Each time the GUI is opened, a new xcb_connection is created which is then never freed.

madah81pnz1 added a commit to madah81pnz1/vstgui that referenced this issue Jan 9, 2025
Cairo keeps a global cache list of connections, but since the reference
count of the device never reaches 0, the devices are never removed from
this cache list.

The crash happens when a new xcb_connection is made, and the new pointer
happens to be the same as an existing pointer in this cache list, then
the wrong cairo_xcb_connection is returned. Then no screen can be found,
since the root is not the same for that old xcb_connection and the new
one. Thus we end up at ASSERT_NOT_REACHED in _get_screen_index(), in
cairo-xcb-screen.c

Note that there is some resource leakage somewhere, since the reference
count of the cairo devices should reach 0 at some point, but never do.

Fixes steinbergmedia#334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant