-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "Loading Track .." forever #1446
Conversation
I remember that I might have discovered both issues in the past. Do you have some simple instructions how to reproduce both reliably for verification? Maybe even by injecting a line of code to trigger the issues? I'm too lazy to figure it out myself ;) |
On Linux ALSA you can user 2.67 ms buffer size and provoke a buffer underrun, this makes the audio callback not return for a long time (at least on my system). This is a pending bug. |
src/engine/engineworkerscheduler.cpp
Outdated
while (i != m_workerStates.end()) { | ||
if (i.key() && i.value()) { | ||
i.key()->wake(); | ||
i.value() = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a typical race condition: The worker thread might re-schedule itself after woken up before the scheduler thread resets the flag to false. Even if the probability might be low, those are the critical spots that distinguish correct from wrong multi-threaded code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can ensure that the worker is not running during this loop then switching both lines would be sufficient. Without the "not running" assumption you need to use an atomic flag and clear() instead of a simple boolean. This is what I would prefer instead of relying on some implicit assumptions that may change at any time without notice.
src/engine/engineworkerscheduler.h
Outdated
@@ -30,7 +31,7 @@ class EngineWorkerScheduler : public QThread { | |||
// runWorkers was run. This should only be touched from the engine callback. | |||
bool m_bWakeScheduler; | |||
|
|||
FIFO<EngineWorker*> m_scheduleFIFO; | |||
QHash<EngineWorker*, bool> m_workerStates; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure that QHash is never modified from another thread while being read/iterated from the scheduler thread? This should be documented in every detail, because we are in multi-threaded land here! I don't recognize it at first sight.
src/engine/engineworkerscheduler.cpp
Outdated
@@ -27,11 +24,17 @@ void EngineWorkerScheduler::workerReady(EngineWorker* pWorker) { | |||
// If the write fails, we really can't do much since we should not block | |||
// in this slot. Write the address of the variable pWorker, since it is | |||
// a 1-element array. | |||
m_scheduleFIFO.write(&pWorker, 1); | |||
m_workerStates[pWorker] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs at least a debug assertion that the worker has been registered before to avoid modifying QHash, But as already explained I'm not happy with the decision to use QHash here, at least not without additional safety lanes in form of many debug assertions that validate our assumptions.
Now it should be more save, hopefully. |
src/engine/engineworkerscheduler.cpp
Outdated
EngineWorker* pWorker = NULL; | ||
while (m_scheduleFIFO.read(&pWorker, 1) == 1) { | ||
if (pWorker) { | ||
m_mutex.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QMutexLocker provides all the functionality you need in a safe way.
The current implementation was lock free when adding workers and workers had to be re-added after scheduled. With your changes the worker registration is not lock-free any more and workers are scheduled forever once they have been registered. Can you explain shortly why the new version is more appropriate than the previous one and why it will not new cause any new issues? I'm not happy with the current design with all those delegations and cyclic dependencies. But we don't need to fix this now. |
src/engine/engineworker.cpp
Outdated
@@ -16,12 +16,17 @@ void EngineWorker::run() { | |||
|
|||
void EngineWorker::setScheduler(EngineWorkerScheduler* pScheduler) { | |||
m_pScheduler = pScheduler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least check by a debug assertion that the worker currently doesn't have any scheduler when this function is called. Otherwise it would be scheduled twice.
src/engine/engineworkerscheduler.cpp
Outdated
if (pWorker) { | ||
m_mutex.lock(); | ||
for(const auto& pWorker: m_workers) { | ||
if (pWorker->isReady()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worker should decide internally if it should actually wake up when triggered. With this modification std::atomic_flag looks like a perfect choice for m_ready.
The workers are only added during initialization and if a skin is loaded with more samplers.
No, they are sleeping on m_sema unit they become ready. We have an extra check of the m_notReady flag though.
The Fifo suffers overflows when in overload conditions. The current solution cannot overflow. As usual I cannot guarantee that it will not suffer any new issues. At least I see none.
If we consider these changes as too risky, I can put them into a new PR targeted to master instead. |
I have rebased in on 2.1 and moved the unrelated commits to #1547 |
Notes addressed. |
} | ||
|
||
void EngineWorker::wakeIfReady() { | ||
if (!m_notReady.test_and_set()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn. I assumed that clear() returns the previous state as a result.
Does it get easier to read if we rename m_notReady to m_idle? On the other hand this would introduce another term around "ready", so let's better leave it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought, and notReady is finally the least hurting name.
while (m_scheduleFIFO.read(&pWorker, 1) == 1) { | ||
if (pWorker) { | ||
pWorker->wake(); | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to introduce separate scopes here, just use unlock() and relock().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean:
while (!m_bQuit) {
Event::start("EngineWorkerScheduler");
QMutexLocker lock(&m_mutex);
for(const auto& pWorker: m_workers) {
pWorker->wakeIfReady();
}
lock.unlock();
Event::end("EngineWorkerScheduler");
lock.relock();
if (!m_bQuit) {
// Wait for next runWorkers() call
m_waitCondition.wait(&m_mutex); // unlock mutex and wait
}
}
I can do this, but it looks less nice for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than I like to s tick with the current version.
Anything else to do?
LGTM |
This fixes
https://bugs.launchpad.net/mixxx/+bug/1740448
Both rare bugs can be simulated by stopping the audio callback.
Can this be part of the hard to catch race condition in the analyzer?