Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Condition variables (again) #3892

Open
ijackson opened this issue Jun 25, 2021 · 15 comments
Open

Condition variables (again) #3892

ijackson opened this issue Jun 25, 2021 · 15 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync

Comments

@ijackson
Copy link
Contributor

Summary

(Hi. This is my first issue/MR against Tokio, so firstly: Hello and thanks for all the hard work making an impressive system!)

I need a condition variable. (I am updating a program from Rocket 0.4 to 0.5, so it is becoming async. I was using a condvar and restructuring my program to some other primitive would be awkward and un-idiomatic.)

Condvars are a very general and powerful synchronisation primitive, with relatively programmer-friendly properties. Implementing them by hand is complex and error-prone, and often achieving the same results by another method is too. I think the lack of condvars in the Tokio ecosystem is a serious omission.

Requested solution

A synchronisation primitive with the standard semantics of a condition variable (condvar) as found for example in pthreads or std::sync::Condvar.

Alternatives and prior history

None of the alternatives seem palatable. I doubt that I can replace a std::sync::Condvar with something from tokio::sync other than by effectively implementing condvars myself. Even if this is possible, it would involve me doing a complex multithreaded programming proof to convince myself that my approach was correct.

In general, a condvar might be needed because an existing program is being converted to async and uses a std::sync::Condvar, or because a program is being converted from C where condvars are available, or because it's just the most convenient primitive for the situation.

Expecting every programmer to do this individually, perhaps per-project, whenever a condvar is needed, is a really bad idea.

I reviewed the history of this feature request in Tokio. I found:

Task cancellation

I don't agree that the issues with task cancellation need to be a blocker. Let me consider notify_all and notify_one separately.

notify_all

notify_all does not need to worry about task cancellation. The guarantee we want is this:

  • Consider a task W, which holds the mutex and calls wait.

  • If any other task N, which has acquired the mutex after W released it as a part of wait, calls notify_broadcast, then W will stop waiting - either, because it was cancelled, or because it wakes up with the mutex held and continues.

  • This applies to all qualifying tasks W

If there are no waiters, then notify_all is a no-op. If all the waiters end up cancelled then it is a no-op. This is all fine.

notify_one

I think in an async context, it only makes sense to notify_one if you know that the tasks which are waiting will not be cancelled. After all, in practice, task cancellation is a thing that there are few good ways to defend yourself against in the task itself, and which it is difficult to code in a way that you can be sure to avoid. Whereas the whole idea behind notify_one is to be able to pass a "baton" or some such, and that depends on the recipient carrying on properly: not only completing the baton exchange, but actually carrying on with the underlying activities.

(In pthreads, a thread might be cancelled after being woken due to pthread's condvar notify. The pthreads docs say that in this case the thread won't eat a condvar notification. This is different to what I am proposing for Tokio. NB that pthread thread cancellation is a nightmare, quite unlike Rust async task cancellation, so people with any sense use it hardly ever if at all.)

So the guarantee we want is this:

  • Suppose a task N acquires (and perhaps releases) the mutex, and calls notify_one.

  • The candidate tasks W are those which at the time of N's mutex acquisition, were, blocked on wait.

  • Not all candidate tasks will remain blocked on wait. That is, at least one candidate task will either wake up or have been cancelled.

  • (If there are no candidate tasks, notify_one is a no-op and this is not detected, although it is probably a mishap.)

Or to put it another way we resolve the cancellation issue this way:

  • If any waiting tasks can be cancelled, notify_one is ineffective.

I think the same rule ought to apply to timeouts. The std docs are not clear about this, but pthread_cond_timedwait can "eat" a notification from pthread_cond_notify even if it returns timed out.

Spurious notifications

In #3742 (comment) a qualm was discussed, including this comment:

Maybe it could be handled by allowing spurious wakeups?

Spurious wakeups are of course allowed. This is even stated in the docs for std::sync::Condvar. (They're not desirable for perf reasons.)

Stupid design sketch

ISTM that it might be possible to cook up a cardboard cutout of a condvar implementation based on tokio::sync::watch. Rust's async system makes it possible for a task to obtain the Receiver::changed future before disposing of the mutex guard.

The resulting impolementation would not have notify_one, just notify_all. That would be enough for my application. (Of course, implementing notify_one as a call to notify_all would be correct - it would uphold all the guarantees - but it would not be very useful; I think it would be better to leave it missing than provide such a poor implementation.)

To have the desired API (ie to avoid needing a separate &Mutex passed into Condvar::wait) I think something like #3741 will be needed.

Way forward

I need to see this fixed. One way or another I guess I am going to have to find (and double-check) or write an implementation of at least condvar broadcast for Tokio.

Currently I am about half a bottle of wine down and not in any state to think very hard about concurrent code :-).

I will look at this again tomorrow, in particular at the code in #3742, and at the other primitives available in Tokio, and decide what to do next.

@ijackson ijackson added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jun 25, 2021
@Darksonn Darksonn added the M-sync Module: tokio/sync label Jun 26, 2021
@Darksonn
Copy link
Contributor

I think in an async context, it only makes sense to notify_one if you know that the tasks which are waiting will not be cancelled.

This is most likely a bad assumption. For example, most web servers will cancel a request handler if the remote connection is closed, and I'm guessing this also applies to rocket. I do not think we can just ignore cancellation.

@ijackson
Copy link
Contributor Author

ijackson commented Jun 28, 2021

FYI I got nerdsniped by this over the weekend. I have an implementation (which does something more useful with cancellation) and am sorting out the tests, docs, etc. It turned out as a separate crate.

I definitely want something like #3742 #3741 so I will send an MR for that.

@ijackson
Copy link
Contributor Author

ijackson commented Jul 5, 2021

Well, I think that yak is mostly shaved now. The result so far is here:
https://crates.io/crates/async-condvar-fair

Here is now I dealt with the cancellation issue:
https://docs.rs/async-condvar-fair/0.1.0/async_condvar_fair/struct.Baton.html

I haven't yet revisited #3741 which would make the use rather more convenient.

I think Tokio could do with a native condvar which uses the same Baton technique for handling cancellation, but which is not necessarily fair. My runtime-agnostic crate can't optimise which task to wake but Tokio's condvars could.

I think the ability to use different mutexes is important. There are good reasons for choosing both sync and async ones.

@Darksonn
Copy link
Contributor

Darksonn commented Jul 6, 2021

I agree that the cancellation problems are avoidable if you allow spurious wakeups. I do like the idea of the wait_baton method.

I think we will run into trouble if we attempt to implement a Condvar that is able to talk to a synchronous mutex. The guard is not Send, and I don't see any way to have the compiler understand that the mutex is unlocked before yielding.

@ijackson
Copy link
Contributor Author

ijackson commented Jul 6, 2021

I agree that the cancellation problems are avoidable if you allow spurious wakeups. I do like the idea of the wait_baton method.

I think spurious wakeups are part of the spec :-). They don't cause trouble in practical uses of condvars, provided that they aren't frequent enough to be perf issue.

I think we will run into trouble if we attempt to implement a Condvar that is able to talk to a synchronous mutex. The guard is not Send, and I don't see any way to have the compiler understand that the mutex is unlocked before yielding.

I didn't find this a problem for async-condvar-fair. My test cases seem to work just fine with std::sync::Mutex. When you use a sync mutex you don't need to hold the gaurd over an await point - you only suspend inside condvar wait with the mutex unlocked.

@Darksonn
Copy link
Contributor

Darksonn commented Jul 6, 2021

Just because they are parts of the specification for std::sync::Condvar doesn't mean they have to be for our Condvar. As for your test-cases, the problem is only triggered if the problematic code is inside a spawned task — and not if its in the main method of the test.

@ijackson
Copy link
Contributor Author

ijackson commented Jul 6, 2021

I have poked my tests some more and it turned out that all my test cases were using single-threaded executors. I added a Send bound to my tasks and now I see the problem you are describing.

38 |     let mut guard = lock_async(&self.mx).await;
   |         --------- has type `Guard<'_, u32>` which is not `Send`
...
44 |       guard = self.cv.wait(for_wait(guard, &self.mx)).await;
   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `mut guard` maybe used later
45 |     }
46 |   }
   |   - `mut guard` is later dropped here

This is rather disappointing because actually the guard that goes into wait is a different one to the one that comes back from it, so it is not actually held. What is held instead is the future from Condvar::wait, which is Send.

Anyway, it can be made to work with parking_lot::Mutex since parking_lot has a send_guard feature which makes its guards Send.

@ijackson
Copy link
Contributor Author

ijackson commented Jul 6, 2021

I've verified that with parking_lot's send_guard enabled, my test task futures are all Send. So it is indeed possible to use an async condvar with a sync mutex, which is nice - although it complicates the design and API of even a runtime-specific async condvar...

@kaimast
Copy link

kaimast commented Dec 24, 2021

Hi all,

I thought about this some more.

As far as I understand, usingNotify in the case where you always have at most one waiter should be more or less the same as using a Condvar. There might just be more suprious wakeups as permits are stored even if no one is waiting.

It is a little more tricky with multiple waiters. The problem here is that after checking the condition and releasing the lock, the other task might call notify_waiters before the waiting task calls notified.
However, it should be sufficient to provide a method like notified_with_lock, that takes a MutexGuard and does not drop it until the current thread is added to the Notify's waiting list.

Additionally, it might be good to have some kind of helper function/macro to build a waiting loop. I usually write something like this which seems very verbose

loop {
    // Scope MutexGuard to avoid deadlock
    {
          let lock = mutex.lock().await;
          if check_condition(&*lock) {
                 break;
          }
     }

     // Note this only is safe if the other task calls notify_one, not notify_all
     notify.notified().await;
}

Unrelated: I am also not super happy with the current terminology in Notify. Usually methods/functions should contains verbs and type names should be nouns.
What was the reasoning for calling it Notify::notified instead of, for example, Notfiy::wait. And calling it Notify instead of Notifier?

@kaimast
Copy link

kaimast commented Feb 23, 2022

Any thoughts or progress on this? I can help creating a new pull request if needed.

@cameronelliott
Copy link

I think this is a really important function for Tokio. At least for my purposes it seems so.
But, the fact the crate from @ijackson is AGPL, not MIT/Apache does complicate both understanding the licensing ramifications, and actually license compliance.

It would be wonderful if there was a Tokio compatible async Condvar
that follows the prevalent MIT/Apache license used on Tokio.

Thanks to @ijackson to show how this can even be done.

@cameronelliott
Copy link

It's probably worth noting there is a MIT/Apache async Condvar in the async-std crate.

It lacks a lot of the wonderful functionality the Condvar from @ijackson does, but it could be re-worked and included in Tokio without affecting Tokio's licensing.

@kaimast
Copy link

kaimast commented May 8, 2022

I have been using a slightly modified Notify as a Condition Variable for a while now (similar to the approach I described before).

I don't have time right now to make a properly documented pull request but I opened a draft here so it can be used for future reference: #4668

@kaimast
Copy link

kaimast commented Feb 18, 2023

FYI: I put a very basic implementation that has been working fine for me here.

@Mek101
Copy link

Mek101 commented Jun 11, 2024

Well, I think that yak is mostly shaved now. The result so far is here:
https://crates.io/crates/async-condvar-fair

That's unfortunately a big no-no for me due to the license

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync
Projects
None yet
Development

No branches or pull requests

5 participants