Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #668 : Fix delete order of threads #846

Merged
merged 5 commits into from
Jun 28, 2020

Conversation

m-seker
Copy link
Contributor

@m-seker m-seker commented Jun 27, 2020

Summary

Fixes #668

_ssdp and ssdpThread are deleted after ssdpThread finishes(HyperionDaemon::startNetworkServices().

connect(ssdpThread, &QThread::finished, _ssdp, &QObject::deleteLater);
connect(ssdpThread, &QThread::finished, ssdpThread, &QObject::deleteLater);

Looking at HyperionDaemon::freeObjects():

_ssdp->thread()->quit();
_ssdp->thread()->wait(1000);

As soon as quit is called, finished signal is emitted from the thread and both objects are destroyed. Hence the second call - wait(1000) SEGFAULTs since it tries to access a destroyed object.

The problem is sporadic because sometimes destruction happens after wait returns. You can reproduce it %100 if you put sleep between quit wait(1000):

_ssdp->thread()->quit();
usleep(2000000);
_ssdp->thread()->wait(1000);

Solution is postponing the delete until both quit and wait returns:

_ssdp->thread()->quit();
_ssdp->thread()->wait(1000);
_ssdp->deleteLater();
_ssdp->thread()->deleteLater();

What kind of change does this PR introduce? (check at least one)

  • Bugfix
  • Feature
  • Code style update
  • Refactor
  • Docs
  • Build-related changes
  • Other, please describe:

If changing the UI of web configuration, please provide the before/after screenshot:

Does this PR introduce a breaking change? (check one)

  • Yes
  • No

If yes, please describe the impact and migration path for existing setups:

The PR fulfills these requirements:

  • When resolving a specific issue, it's referenced in the PR's body (e.g. Fixes: #xxx[,#xxx], where "xxx" is the issue number)

If adding a new feature, the PR's description includes:

  • A convincing reason for adding this feature
  • Related documents have been updated (docs/docs/en)
  • Related tests have been updated

To avoid wasting your time, it's best to open a feature request issue first and wait for approval before working on it.

Other information:

@hyperion-project
Copy link

Hello @m-seker 👋

I'm your friendly neighborhood bot and would like to say thank you for
submitting a pull request to Hyperion!

So that you and other users can test your changes more quickly,
you can find your workflow artifacts here.

If you make changes to your PR, i create a new link to your workflow artifacts.

Best regards,
Hyperion-Project

@Lord-Grey
Copy link
Collaborator

@m-seker Looks like you have a good understanding of Qt and Cpp in general.
Maybe you would like to have a look at the python environment and the Effect-Engine Instances.
Python environment cores selectively too and if I am not mistaken the Engine Instances are not properly destructed....

Unfortunately, I have not had any time myself, but maybe it gives you some challenge you like to take up ...:)

@m-seker
Copy link
Contributor Author

m-seker commented Jun 27, 2020

@Lord-Grey sure, I'm pretty new to the codebase but I can give it a try :)
Can you give bit more detail on how to reproduce it ? Does it also happen during quit like the original issue?

A quick visual inspection for Effect.cpp and EffectEngine.cpp didn't reveal a potential crash area but only some memory leaks (active effect leaks on quit).

EDIT: I gave another look. If there are active effects in EventEngine(_activeEvents), the destructor doesn't take those into account and leaks during quit. Since we don't wait Effect thread to finish properly, it can be somewhere in the middle of Effect::run() method and we may be executing PythonInit::~PythonInit() while Effect is still running.

Something like this should eliminate that possibility (and the leak as well) :

EffectEngine::~EffectEngine()
{
    for (Effect * effect : _activeEffects)
    {
        effect->wait();
        effect->deleteLater();
    }
}

Probably this is not the issue you are talking about but better to mention when I'm on it.

@Lord-Grey
Copy link
Collaborator

@m-seker

on the Python problem, please find a stack-trace below.
I did not do a detailed investigation, but the scenario you outlining (effect running which stopping hyperion) may be the root cause. In if not you anyway have a point.
Besides the stacktrace, you find selected events of the hyperion log (which I filtered on the effect related ones) and those which are coming after the "kill" request.
To reproduce, set up a new hyperion systen, e.g. with a separate database using "-u" and then quickly after starting up (and while the effect is running kill hyperion. Unfortunately, the problem is not reproducible deterministically....

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
tCore was generated by `./build-x86x64/bin/hyperiond -d -u hue'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fd3c9696800 (LWP 61457))]

(gdb) bt
#0  0x00007fd3cbedf3eb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fd3cbebe96e in __GI_abort () at abort.c:100
#2  0x00007fd3cd5d4db8 in  () at /usr/lib/x86_64-linux-gnu/libpython3.7m.so.1.0
#3  0x00007fd3cd6d6625 in  () at /usr/lib/x86_64-linux-gnu/libpython3.7m.so.1.0
#4  0x00007fd3cd6d35db in PyInterpreterState_Delete () at /usr/lib/x86_64-linux-gnu/libpython3.7m.so.1.0
#5  0x00007fd3cd6d6828 in Py_FinalizeEx () at /usr/lib/x86_64-linux-gnu/libpython3.7m.so.1.0
#6  0x0000560c2cd12e9f in PythonInit::~PythonInit() (this=0x560c2e47a600, __in_chrg=<optimised out>) at /home/thomas/hyperion.ng-lordgrey/libsrc/python/PythonInit.cpp:61
#7  0x0000560c2cc5a329 in HyperionDaemon::~HyperionDaemon() (this=0x560c2e4a7f40, __in_chrg=<optimised out>) at /home/thomas/hyperion.ng-lordgrey/src/hyperiond/hyperiond.cpp:148
#8  0x0000560c2cc5a386 in HyperionDaemon::~HyperionDaemon() (this=0x560c2e4a7f40, __in_chrg=<optimised out>) at /home/thomas/hyperion.ng-lordgrey/src/hyperiond/hyperiond.cpp:149
#9  0x0000560c2cc6c139 in main(int, char**) (argc=4, argv=0x7fff755f1b48) at /home/thomas/hyperion.ng-lordgrey/src/hyperiond/main.cpp:343

...
[hyperiond EFFECTENGINE] <INFO> Run effect "Rainbow swirl fast" on channel 0
[hyperiond EFFECTENGINE] <DEBUG> <EffectEngine.cpp:183:runEffectScript()> Start the effect: name [Rainbow swirl fast], smoothCfg [2]
[hyperiond EFFECTENGINE] <INFO> Run effect "Warm mood blobs" on channel 254
[hyperiond EFFECTENGINE] <DEBUG> <EffectEngine.cpp:183:runEffectScript()> Start the effect: name [Warm mood blobs], smoothCfg [2]
[hyperiond HYPERION] <INFO> Inital background effect 'Warm mood blobs' started
[hyperiond HYPERION] <INFO> Hyperion instance 'First LED Hardware instance' has been started
..

^C

[hyperiond DAEMON] <DEBUG> <hyperiond.cpp:167:freeObjects()> Cleaning up Hyperion before quit.
[hyperiond FLATBUFSERVER] <INFO> Stopped
[hyperiond PROTOSERVER] <INFO> Stopped
[hyperiond WEBSERVER] <INFO> Stopped Hyperion Webserver
[hyperiond WEBSERVER] <INFO> Previous line repeats 1 times
[hyperiond HYPERION] <INFO> Hyperion instance 'First LED Hardware instance' has been stopped
[hyperiond LEDDEVICE] <DEBUG> <LedDeviceFile.cpp:85:close()> File: /dev/null
[hyperiond LedDeviceWrapper] <INFO> LedDevice 'file' closed
[hyperiond V4L2:auto] <DEBUG> <GrabberWrapper.cpp:57:stop()> Grabber stop()
[hyperiond V4L2:auto] <DEBUG> <GrabberWrapper.cpp:43:~GrabberWrapper()> Close grabber: V4L2:auto
[hyperiond MAIN] <INFO> Application closed with code 0
[hyperiond DAEMON] <DEBUG> <PythonInit.cpp:59:~PythonInit()> Cleaning up Python interpreter
Fatal Python error: PyInterpreterState_Delete: remaining subinterpreters

@Lord-Grey
Copy link
Collaborator

Lord-Grey commented Jun 28, 2020

@m-seker

A quick visual inspection for Effect.cpp and EffectEngine.cpp didn't reveal a potential crash area but only some memory leaks (active effect leaks on quit).

Correct. That is what I remember vaguely, too.
There seems to be allocations with "new", but missing "delete"s.

@m-seker
Copy link
Contributor Author

m-seker commented Jun 28, 2020

@Lord-Grey I created the following PR for fixing the mentioned problem:

#850

Unfortunately I couldn't reproduce the crash. Maybe when you have time you can give the proposed fix a try after it is merged (or with PR artifacts) and see if it is still reproducible ?

@Paulchen-Panther
Copy link
Member

Paulchen-Panther commented Jun 28, 2020

I can confirm that this PR works wonderfully in conjunction with PR 850. Thank you for your work. 👍

@Paulchen-Panther Paulchen-Panther added the HotFix This PR fixes a major bug label Jun 28, 2020
@brindosch
Copy link
Contributor

Thanks you :)

@hyperion-project
Copy link

Here is your new link to your workflow artifacts.

@brindosch brindosch merged commit a68ed7d into hyperion-project:master Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HotFix This PR fixes a major bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hyperion cores selectively during quit
4 participants