Skip to content

Using RTL condition variables in the RTL

John Skaller edited this page Nov 26, 2018 · 4 revisions

Felix has C++ type

namespace flx { namespace pthread {
class PTHREAD_EXTERN flx_condv_t : public world_stop_notifier_t
{
  ::std::mutex m;
  ::std::condition_variable_any cv;
  void notify_world_stop() override;
  thread_control_base_t *tc;
public:
   flx_condv_t (thread_control_base_t *);
   void lock();
   void unlock();
   void wait();
   void timed_wait(double seconds);
   void signal();
   void broadcast();
   ~flx_condv_t();
};

which is an implementation of a condition variable. To use it first you construct a value of the type passing it a pointer to the thread_control_base. The reason for this will be explained later.

The condition variable is used to test some volatile condition. To do so, you first have to call the lock() method. Next, you can check the condition.

Now you have two choices. If the condition is satisfied, you must release the lock and proceed with the work that was waiting on the condition.

If the condition is not satisfied, you must call the wait() method instead. The lock must be set when you call the wait() or timed_wait() method.

The wait() method delays until either a timeout of 1 second expires or a signal has been sent to the condition variable. The timed_wait() method is similar except you can set the delay. The delay ensures will eventually be checked again, even if a signal is not received.

On entry to the wait() or timed_wait() methods, the lock is released. When the wait is complete, the lock is re-acquired.

On exit from the wait() or timed_wait() method the lock is held. The usual action at this point is to jump back to the code which tests the condition again. This is often done in a loop.

Typical code is:

flx_condv cv;
cv.lock();
while (!condition_satisfied()) cv.wait();
// condition is now satisfied

When some code changes variables which may cause the condition to become satisfied, that code should call either the signal() or broadcast() method.

The signal() method releases one pthread waiting on the condition to be released from the wait() or timed_wait(). The broadcast() method release all of the waiting pthreads.

The rules for signalling are as follows:

  • It is safe but inefficient to never issue a signal. This is because Felix condition variable has a timeout.

  • It is safe, but possibly inefficient, to always use broadcast(). This may wake up too many threads, but the threads will check the condition in a serialised manner because the condition check is done whilst the lock is held. However the user must modify the state before releasing the lock if other threads should go back to sleep.

  • It is not always correct to call signal(). Sometimes it is necessary to wake up more than one thread.

Now we must explain the reason a thread_control_base_t must be passed. Notice that the condition variable class is derived from world_stop_notifier. When the condition variable is constructed, it registers itself with the thread_control object. The registration is removed when the condition variable is destroyed.

Felix has a world-stop garbage collector. In order for the collector to work, the state of the memory to be scanned must be stable. To ensure this, all thread must stop working whilst the garbage collector is running.

Threads can yield control to the garbage collector by calling the yield() method of the thread_control object. The yield method checks if there is a request to perform a garbage collection. If so, the thread is suspended until the collection is completed. Otherwise if there is no collection requested, the yield() method returns.

The yield() method is called automatically inside the condition variable wait() and timed_wait() methods. There is no need to call it yourself, although you may.

Now, the reason for the registration is that when a collection is requested, the requesting thread broadcasts a signal to any threads waiting on a registered condition variable. This makes the thread wake up and perform a yield, instead of waiting for a timeout to expire. Without the signal or the timeout, the thread might remain locked up in the condition variables, because other threads that might signal it have already yielded to the collector. In this case we would have a deadlock, because the collector can't run, but there are no threads running to wake up the last one.

For this reason, Felix cannot usually use system condition variables. Note that in addition to suspending on a world-stop request by the collector, thread must also post their current stack pointer to the thread control object. This is because te collector must conservatively scan the machine stacks of all threads for managed pointers, which are treated as GC roots.