-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForEvent( 0 ) can return WAIT_TIMEOUT under lock contention when actually in the signaled state #18
Comments
Nice catch. The only reason a separate The original implementation used a spinlock for gaining access to the event object's internals, but still had the mutex because it's required for the condition variables to marshal correctly. Given the increase in surface area for synchronization issues combined with the fact that the mutex was still used to wake one waiting thread at a time (via the c-variable), I ended up just folding everything into the mutex. I think (but I'll need to study the code again to be sure) an optimization that should safely mitigate at least some of the performance drawbacks of dropping (One minor issue is that I would want to use |
Test case based off the pseudocode provided by @qwertymaster617 in #18.
Test case based off the pseudocode provided by @qwertymaster617 in #18.
I've had a lot to think about regarding your patches last week. I have a few issues with this solution. TL/DR:
I've had to brush up significantly on my memory ordering rules this week so I may be incorrect. But here's what I think. First: I've looked into the issue presented in the url cited in the source, and IMHO, the concern you bring up doesn't apply here for 2 reasons:
So if using Introducing From cppreference, memory-order on Release-Acquire ordering:
This is exactly what mutex unlock-lock does. And since the call to state.store is sequenced-before the mutex unlock (the release) in thread A, the release-acquire rule applies to state through the mutex. Therefore, I say use Finally, the only remaining overhead is the interlocked instruction being used while holding the lock, which is unfortunate but necessary because of the naked Second: The optimization that replaced the bug seems nice, but why not go all the way? I'd argue that for the same exact reasons that you can do why not: I believe the same reasoning applies to both cases. Just as with checking the "possibly-or-not-possibly-signaled" state and assuming failure if the data in the cache says unsignaled, we can absolutely assume the same thing on
Of course, we won't have the opportunity to cleanup expired shared waits under another code path, but I'd argue that this may be a better tradeoff when checking for the instantaneous state. I'm not entirely convinced this is valid, but I can't think of any sure-fire reasons why it is less correct in reasoning or just plain wrong compared to the current optimization you present with this patch, and I feel like we may be the only people worth asking specifics about this particular subject. What do you think? The WaitForMultiple problem just might be my favorite problem in the world. This problem never ceases to disappoint. |
So I keep thinking about my second point in my post above. Despite all that stuff I said above being true, and even if it all is, I think the reason it could be an incorrect assertion to return success when manual-reset, load(), and timeout == 0 is simply because philosophically speaking, when waiting-for or waiting-on a given event, we must assume failure to wait on it at all times. That's why we use it. And importantly, the reason for this is the converse where if we were to ever assume success when checking instanteously, then why would we ever need to use an event? Why not just comment out that By simply placing the code to waitfor, we carefully approach that spooky-looking, I-think-is-only-a-2-way stop sign intersection (event.waitfor) at 2am when it's raining... and snowing. It's never safe to roll through it. You can do it, but it's risky. Even if someone is waving you on with an orange vest, you still intinctively slow down and look as much as possible. Someone may appear right on top of you (the boogey man?). So instead, you assume it's not safe until you stop (acquire the lock), make sure there is no boogey man (confirm state), and then cross (release). If you ever assume, when waiting on an event, that you can just go, then just go and ask yourself why you ever need to call an event waitfor. There would be no reason to use it then. Just execute. And my guess is one's code would subsequently look dramatically differently to handle such a case in comparison to code that is using an event waitfor. Thus, I don't think we can assume a successful wait just because it says it's ok and we haven't obtained the lock. Is this correct reasoning? Is it valid in this context? Is it valid in any context? That metaphor was somewhat helpful to me but I'm not sure it's quite right, either. In this context I guess we can say that the default state of an event is non-signaled. Otherwise, why have the functionality at all? So, if we check the cache and However, if we were to ever assume the opposite, i.e. successful wait, as I state in my previous post, then saying success without synchronizing breaks the contract of no observable side-effects, because now the program may think that the non-default case is/has occurred, thus what I believe to be the breaking of the no observable side-effects rule of the validity of atomic loads and stores. Thus why the current implementation is (most? technically?) correct. I'm a noob to memory order. I have gone down the rabbit hole; please someone point out anything I got wrong. If this is all obvious to others, I can assure you that this certainly was not to me a week ago, and it most certainly will not be several weeks from now (C++ is no longer my profession) before I forget most of this context. So I'm capturing for my benefit as well as any poor sap who can follow this huge, dry train of thought. All that being said, does this seem to be a sound explanation as to why my second point is not correct and why the current implementation is most-likely the best we'll get for instantaeous state checking? |
I confused myself a bit in an earlier reply. You might want to look at 9c22e4e which changes the memory access semantics based off of an article I found that addressed my concerns regarding resetting But specifically regarding an early return for manual reset events, I don't think that's actually a matter of memory access semantics. It would definitely be valid to do an acquire load and return success if the state is set, because manual reset events are racy by nature so any consecutive calls to A relaxed load will (by definition) return a potentially stale value. It can be argued that the semantics of WFMO(0) means false negatives are OK, but if you remember your other bug, that's not strictly the case. The real question is if a calling thread can distinguish between a potential race condition with another thread (calling ResetEvent for either manual or auto events, or even calling WFMO in case of auto events), stale value, or internal optimization (such as the pthread If it's impossible to devise a test where one can explicitly rule out race conditions while still possibly getting a stale result, then it's OK to conflate the results and return a stale false as a Once you put it like that, it becomes easy to see why we can't serve a stale auto mreset = CreateEvent(true, true); // manual reset, initially set
auto areset1 = CreateEvent(false, false); // auto reset, not set
auto areset2 = CreateEvent(false, false); // auto reset, not set
auto areset3 = CreateEvent(false, false); // auto reset, not set
CreateThread([&] {
WaitForSingleEvent(areset1);
ResetEvent(mreset);
SetEvent(areset2);
});
CreateThread([&] {
// Force caching of currently available state
assert(WaitForSingleEvent(mreset, 0) == 0);
SetEvent(areset1);
WaitForSingleEvent(areset3);
assert(WaitForSingleEvent(mreset, 0) != 0);
});
WaitForSingleEvent(areset2);
SetEvent(areset3); The main thread starts thread 1 and then thread 2, thread 2 caches the current state of mreset (available), triggers thread 1 to reset the manual event, and waits for areset3, at which point our design guarantees that mreset is not available. But there is no forced synchronization between thread 1 and thread 2 after mreset is set, because thread 1 tells thread 2 to check mreset indirectly via areset3 (handled by the main thread). Via static analysis, we know the second call to What this tells us is that even the current optimization to return false in case of a relaxed load false value is technically incorrect, since you can invert the test (initial mstate is false, and instead of calling I have half a mind to reintroduce the timed wait optimization and just document that a spurious |
This requires a memory barrier to ensure correctness (avoiding false positive caused by stale read) but should still be cheaper than a syscall and only happens if we expect it to be available (based off the initial relaxed read). See #18
The
pthread_mutex_trylock
implementation ofWaitForEvent
(with no timeout) is flawed as it can easily returnWAIT_TIMEOUT
when an Event, either auto-reset or manual-reset, is clearly in the signaled state. This issue only affectsWaitForEvent
calls where timeout = 0.First, consider an already-signaled manual-reset event with 2 threads,
A
andB
, each callingWaitForEvent( 0 )
concurrently in a tight loop. IfA
is holding the lock whileB
attempts to grab the lock,B
will erroneously returnWAIT_TIMEOUT
, even though the event is always in the signaled state.Since we know that:
ResetEvent
, andA
norB
can change the signaled state of a manual-reset event solely by callingWaitForEvent
We know the
pthread_mutex_trylock
implementation is incorrect here.Second, consider an already-signaled auto-reset event with several threads each calling SetEvent concurrently.
If some other thread,
C
, continuously loops on calls toWaitForEvent
as follows:At some point, if one of the threads is calling
SetEvent
whileC
just begins a call toWaitForEvent
,C
's call toWaitForEvent
will incorrectly returnWAIT_TIMEOUT
simply becauseC
may fail to grab the lock.Since we know that:
C
is the only waiter,ResetEvent
by any threads, andC
ensures that upon every iteration of the loop,event
is in the signaled stateC
's call toWaitForEvent
should never returnWAIT_TIMEOUT
in this case.Tests cases exhibit this behavior perfectly. I'd put some up, but they're easy enough to reason about that it's not really necessary.
The only fix I see is to replace the
pthread_mutex_trylock
logic with a simple call topthread_mutex_lock
only.I almost additionally recommended the possibility of making
event->State
atomic, and returningWAIT_TIMEOUT
or 0 based on the value of the atomic load ofevent-State
, except that auto-reset events must be reset before returning. This might warrant introducing extra logic just to figure out if you're able to reset the state. Since it's unclear what the ramifications are in that case, I think the only simply-correct solution is to grab the lock and block if its already taken.The text was updated successfully, but these errors were encountered: