Fix lock() and unlock() #1

MarkRivers · 2022-04-18T22:10:42Z

GSECARS borrowed a Lambda detector from the detector pool as few weeks ago. We had some issues with the detector hanging up. I looked at the driver code today and I found some issues with mutex locking. These are the issues:

This driver uses 2 EPICS mutexes.
- The first is the mutex inherited from asynPortDriver. It is controlled with this->lock() and this->unlock().
- The second is the dequeLock mutex created in this driver.
A very important rule when threads are using more than one mutex is that all threads must lock and unlock these mutexes in the same order. If they do not then a deadlock can occur. This is a problem in this driver.
- In this line dequeLock() is called while the main mutex is locked.
  
  ADLambda/LambdaApp/src/ADLambda.cpp
  
  Line 647 in 45eb245
  
  dequeLock->lock();
- In this line
  
  ADLambda/LambdaApp/src/ADLambda.cpp
  
  Line 517 in 45eb245
  
  dequeLock->lock();
  
  dequeLock() is called when the main lock is not locked. This is because exportThread is a separate thread that has never called this->lock().
All calls to the asynPortDriver parameter library (setIntegerParam(), callParamCallbacks(), etc.) must me made with the main asynPortDriver mutex locked. However, this driver does not have that mutex locked when it calls the parameter library in the following places:
- ADLambda/LambdaApp/src/ADLambda.cpp
  
  Line 477 in 45eb245
  
  this->setIntegerParam(rec_index, ADNumImagesCounter, 0);
- ADLambda/LambdaApp/src/ADLambda.cpp
  
  Line 526 in 45eb245
  
  this->setIntegerParam(NDArraySize, info.totalBytes);
- ADLambda/LambdaApp/src/ADLambda.cpp
  
  Line 657 in 45eb245
  
  this->setIntegerParam(receiver, LAMBDA_DecodedQueueDepth, numBuffered);

I have no way to test this PR. There may be places that you do want to release the lock for computationally expensive operations when no memory shared with other threads is being accessed. But this must be done very carefully and only when needed.

keenanlang · 2022-04-19T00:39:25Z

A very important rule when threads are using more than one mutex is that all threads must lock and unlock these mutexes in the same order. If they do not then a deadlock can occur. This is a problem in this driver

While lock hierarchies are necessary with multiple mutexes, they only apply when multiple locks are being acquired at the same time. At any one point, you should not attempt to acquire a lock higher in the hierarchy than any lock that you already have, but you don't have to lock every lock in the hierarchy to get to the one that guards the resource you want.

Absent anything else in the code that could cause stalls, a thread that goes:

Lock A
Lock B
Code 1
Unlock B
Unlock A

and a thread going

Lock B
Code 2
Unlock B
Lock A
Code 3
Unlock A

does not have the possibility of a deadlock. Either thread holding any of the locks that they can will eventually make it to the point where they unlock the mutex, there is never a point where one thread will be waiting for a lock that wont eventually be unlocked (again, assuming that there is nothing in code 1, 2, or 3 that attempts to wait on A or B).

All calls to the asynPortDriver parameter library (setIntegerParam(), callParamCallbacks(), etc.) must me made with the main asynPortDriver mutex locked. However, this driver does not have that mutex locked when it calls the parameter library in the following places:

I've moved the locks around to fix the variable setting that was being done in the spots you mentioned. d630718.

MarkRivers · 2022-04-19T01:26:16Z

@keenanlang my apologies you are correct there was not a problem with a deadlock.

However, your fix still leaves several problems in exportThread:

You are accessing class member variables (this->connected, this->pImage) in lines 509-521 without taking the lock to guard these against modification by other threads.
You are accessing the parameter library in lines 529-531 without the lock held. As I said previously this is a mistake.

In my opinion the correct way to write exportThread is to hold the lock the entire time except when calling epicsThreadSleep in line 511. Why not do this?

keenanlang · 2022-04-19T03:00:12Z

You are accessing class member variables (this->connected, this->pImage) in lines 509-521 without taking the lock to guard these against modification by other threads.

pImage isn't being modified by any other thread, so that isn't a problem; export_queue is protected by the mutex; and while connected isn't guarded, the worst that would happen from a race condition is that a single extra frame might be exported if someone tried to disconnect, but that isn't exactly a problem.

In my opinion the correct way to write exportThread is to hold the lock the entire time except when calling epicsThreadSleep in line 511. Why not do this?

While it wouldn't be a whole lot of extra code going on, I'd like to lock on as minimal amount of work as possible as the multi-module lambdas need as much processing time available for the stitching code as possible, especially as speeds and sizes increase.

You are accessing the parameter library in lines 529-531 without the lock held.

Adjusted.

MarkRivers · 2022-04-19T04:11:56Z

The lock is also not being taken at the beginning of acquireThread where the parameter library is being accessed.

Fix lock() and unlock()

5930a8f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lock() and unlock() #1

Fix lock() and unlock() #1

MarkRivers commented Apr 18, 2022

keenanlang commented Apr 19, 2022

MarkRivers commented Apr 19, 2022

keenanlang commented Apr 19, 2022

MarkRivers commented Apr 19, 2022

Fix lock() and unlock() #1

Are you sure you want to change the base?

Fix lock() and unlock() #1

Conversation

MarkRivers commented Apr 18, 2022

keenanlang commented Apr 19, 2022

MarkRivers commented Apr 19, 2022

keenanlang commented Apr 19, 2022

MarkRivers commented Apr 19, 2022