You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This report is similar to #1160 because a resource, in this case a message queue, gets closed while it's still being used by another thread.
This issue silently works on Linux because of its mqueue implementation details. The macOS implementation of mqueue in #1161 maintains a mutex in each of its methods including the mq_close method.
The deadlock results from the following collision:
Main thread attempts to OS_QueueDelete that calls into mq_close that wants to acquire the queue's mutex.
The task1 that has been deleted just before was still waiting on mq_timedreceive holding the queue's mutex.
status=OS_TimerDelete(timer_id);
UtAssert_True(status==OS_SUCCESS, "Timer delete Rc=%d", (int)status);
// When a task is deleted below, its executing thread gets cancelled and destroyed but at that moment // the task1 is still waiting on the message queue.status=OS_TaskDelete(task_1_id);
UtAssert_True(status==OS_SUCCESS, "Task 1 delete Rc=%d", (int)status);
status=OS_QueueDelete(msgq_id); // deadlock hereUtAssert_True(status==OS_SUCCESS, "Queue 1 delete Rc=%d", (int)status);
Describe the solution you'd like
One could argue that the queue-test works on Linux and therefore the macOS implementation of mqueue should accommodate. At the same time, similar to the #1160, it looks like macOS implementation actually highlights the fact that the queue-test relied on undefined behavior.
Describe alternatives you've considered
For now, there is a custom hack to trylock on the mqueue's mutex before actually trying to close the mqueue.
// TODO: without this trylock, queue-test deadlocks on macOSif ((n=pthread_mutex_trylock(&mqhdr->mqh_lock)) ==EBUSY)
{
(void)pthread_mutex_unlock(&mqhdr->mqh_lock);
}
if ((n=pthread_mutex_lock(&mqhdr->mqh_lock)) !=0) {
errno=n;
return (-1);
}
Additional context
See the stacktrace of the blocked thread below. Note that I cannot provide the second thread that holds the lock because that thread is pthread_cancelled by that point (and the mutex has leaked).
For commands containing file names, replace the call to
CFE_SB_MessageStringGet() - which is just a basic copy - to
the new filename-aware function CFE_FS_ParseInputFileName().
This means that the default pathname/extension logic is applied
here too and only a "basename" is strictly necessary, although
if a full/absolute path is given, it will be used as is.
jphickey
pushed a commit
to jphickey/osal
that referenced
this issue
Aug 10, 2022
Is your feature request related to a problem? Please describe.
This report is similar to #1160 because a resource, in this case a message queue, gets closed while it's still being used by another thread.
This issue silently works on Linux because of its
mqueue
implementation details. The macOS implementation ofmqueue
in #1161 maintains a mutex in each of its methods including themq_close
method.The deadlock results from the following collision:
mq_close
that wants to acquire the queue's mutex.mq_timedreceive
holding the queue's mutex.Describe the solution you'd like
One could argue that the
queue-test
works on Linux and therefore the macOS implementation ofmqueue
should accommodate. At the same time, similar to the #1160, it looks like macOS implementation actually highlights the fact that thequeue-test
relied on undefined behavior.Describe alternatives you've considered
For now, there is a custom hack to
trylock
on the mqueue's mutex before actually trying to close the mqueue.Additional context
See the stacktrace of the blocked thread below. Note that I cannot provide the second thread that holds the lock because that thread is pthread_cancelled by that point (and the mutex has leaked).
Requester Info
Stanislav Pankevich (Personal contribution)
Stacktrace:
Full test log, note that the mqueue's mutex is locked 19 times but unlocked only 18 times.
The text was updated successfully, but these errors were encountered: