-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the buggy Future and Promise implementations #299
Merged
BewareMyPower
merged 4 commits into
apache:main
from
BewareMyPower:bewaremypower/fix-macos-future-wait
Jul 5, 2023
Merged
Fix the buggy Future and Promise implementations #299
BewareMyPower
merged 4 commits into
apache:main
from
BewareMyPower:bewaremypower/fix-macos-future-wait
Jul 5, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BewareMyPower
requested review from
merlimat,
RobertIndie,
Demogorgon314 and
shibd
July 3, 2023 16:10
Fixes apache#298 ### Motivation Currently the `Future` and `Promise` are implemented manually by managing conditional variables. However, the conditional variable sometimes behaviors incorrectly on macOS, while the existing `future` and `promise` from the C++ standard library works well. ### Modifications Redesign `Future` and `Promise` based on the utilities in the standard `<future>` header. In addition, fix the possible race condition when `addListener` is called after `setValue` or `setFailed`: - Thread 1: call `setValue`, switch existing listeners and call them one by one out of the lock. - Thread 2: call `addListener`, detect `complete_` is true and call the listener directly. Now, the previous listeners and the new listener are called concurrently in thread 1 and 2. This patch fixes the problem by adding a future to wait all listeners that were added before completing are done. ### Verifications Run the reproduce code in apache#298 for 10 times and found it never failed or hang.
BewareMyPower
force-pushed
the
bewaremypower/fix-macos-future-wait
branch
from
July 4, 2023 07:50
c2453ef
to
00473ec
Compare
BewareMyPower
force-pushed
the
bewaremypower/fix-macos-future-wait
branch
from
July 4, 2023 08:52
96f4e20
to
3d16f50
Compare
BewareMyPower
force-pushed
the
bewaremypower/fix-macos-future-wait
branch
from
July 4, 2023 08:59
3d16f50
to
8ff31fe
Compare
Demogorgon314
approved these changes
Jul 4, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! After this change, the reproduce code never failed or hang.
shibd
approved these changes
Jul 4, 2023
RobertIndie
reviewed
Jul 4, 2023
@RobertIndie PTAL again. |
RobertIndie
approved these changes
Jul 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Zike Yang <[email protected]>
BewareMyPower
added a commit
to BewareMyPower/pulsar-client-cpp
that referenced
this pull request
Oct 26, 2023
### Motivation There is a case that deadlock could happen for a `Future`. Assume there is a `Promise` and its `Future`. 1. Call `Future::addListener` to add a listener that tries to acquire a user-provided mutex (`lock`). 2. Thread 1: Acquire `lock` first. 3. Thread 2: Call `Promise::setValue`, the listener will be triggered first before completed. Since `lock` is held by Thread 1, the listener will be blocked. 4. Thread 1: Call `Future::addListener`, since it detects the `InternalState::completed_` is true, it will call `get` to retrieve the result and value. Then, deadlock happens: - Thread 1 waits for `lock` is released, and then complete `InternalState::future_`. - Thread 2 holds `lock` but wait for `InternalState::future_` is completed. In a real world case, if we acquire a lock before `ProducerImpl::closeAsync`, then another thread call `setValue` in `ClientConnection::handleSuccess` and the callback of `createProducerAsync` tries to acquire the lock, `handleSuccess` will be blocked. Then in `closeAsync`, the current thread will be blocked in: ```c++ cnx->sendRequestWithId(Commands::newCloseProducer(producerId_, requestId), requestId) .addListener([self, callback](Result result, const ResponseData&) { callback(result); }); ``` The stacks: ``` Thread 1: #11 0x00007fab80da2173 in pulsar::InternalState<...>::complete (this=0x3d53e7a10, result=..., value=...) at lib/Futre.h:61 #13 pulsar::ClientConnection::handleSuccess (this=this@entry=0x2214bc000, success=...) at lib/ClientConnection.cc:1552 Thread 2: #8 get (result=..., this=0x3d53e7a10) at lib/Future.h:69 #9 pulsar::InternalState<...>::addListener (this=this@entry=0x3d53e7a10, listener=...) at lib/Future.h:51 #11 0x00007fab80e8dc4e in pulsar::ProducerImpl::closeAsync at lib/ProducerImpl.cc:794 ``` There are two points that make the deadlock: 1. We use `completed_` to represent if the future is completed. However, after it's true, the future might not be completed because the value is not set and the listeners are not completed. 2. If `addListener` is called after it's completed, we still push the listener to `listeners_` so that previous listeners could be executed before the new listener. This guarantee is unnecessarily strong. ### Modifications First, complete the future before calling the listeners. Then, use an enum to represent the status: - INITIAL: `complete` has not been called - COMPLETING: when the 1st time `complete` is called, the status will change from INITIAL to COMPLETING - COMPLETED: the future is completed. Besides, implementation of `Future` is simplified. apache#299 fixes a possible mutex crash by introducing the `std::future`. However, the root cause is the conditional variable is not used correctly: > Even if the shared variable is atomic, it must be modified while owning the mutex to correctly publish the modification to the waiting thread. See https://en.cppreference.com/w/cpp/thread/condition_variable The simplest way to fix apache#298 is just adding `lock.lock()` before `state->condition.notify_all();`.
merlimat
pushed a commit
that referenced
this pull request
Oct 30, 2023
#334) * Fix possible deadlock of Future when adding a listener after completed ### Motivation There is a case that deadlock could happen for a `Future`. Assume there is a `Promise` and its `Future`. 1. Call `Future::addListener` to add a listener that tries to acquire a user-provided mutex (`lock`). 2. Thread 1: Acquire `lock` first. 3. Thread 2: Call `Promise::setValue`, the listener will be triggered first before completed. Since `lock` is held by Thread 1, the listener will be blocked. 4. Thread 1: Call `Future::addListener`, since it detects the `InternalState::completed_` is true, it will call `get` to retrieve the result and value. Then, deadlock happens: - Thread 1 waits for `lock` is released, and then complete `InternalState::future_`. - Thread 2 holds `lock` but wait for `InternalState::future_` is completed. In a real world case, if we acquire a lock before `ProducerImpl::closeAsync`, then another thread call `setValue` in `ClientConnection::handleSuccess` and the callback of `createProducerAsync` tries to acquire the lock, `handleSuccess` will be blocked. Then in `closeAsync`, the current thread will be blocked in: ```c++ cnx->sendRequestWithId(Commands::newCloseProducer(producerId_, requestId), requestId) .addListener([self, callback](Result result, const ResponseData&) { callback(result); }); ``` The stacks: ``` Thread 1: #11 0x00007fab80da2173 in pulsar::InternalState<...>::complete (this=0x3d53e7a10, result=..., value=...) at lib/Futre.h:61 #13 pulsar::ClientConnection::handleSuccess (this=this@entry=0x2214bc000, success=...) at lib/ClientConnection.cc:1552 Thread 2: #8 get (result=..., this=0x3d53e7a10) at lib/Future.h:69 #9 pulsar::InternalState<...>::addListener (this=this@entry=0x3d53e7a10, listener=...) at lib/Future.h:51 #11 0x00007fab80e8dc4e in pulsar::ProducerImpl::closeAsync at lib/ProducerImpl.cc:794 ``` There are two points that make the deadlock: 1. We use `completed_` to represent if the future is completed. However, after it's true, the future might not be completed because the value is not set and the listeners are not completed. 2. If `addListener` is called after it's completed, we still push the listener to `listeners_` so that previous listeners could be executed before the new listener. This guarantee is unnecessarily strong. ### Modifications First, complete the future before calling the listeners. Then, use an enum to represent the status: - INITIAL: `complete` has not been called - COMPLETING: when the 1st time `complete` is called, the status will change from INITIAL to COMPLETING - COMPLETED: the future is completed. Besides, implementation of `Future` is simplified. #299 fixes a possible mutex crash by introducing the `std::future`. However, the root cause is the conditional variable is not used correctly: > Even if the shared variable is atomic, it must be modified while owning the mutex to correctly publish the modification to the waiting thread. See https://en.cppreference.com/w/cpp/thread/condition_variable The simplest way to fix #298 is just adding `lock.lock()` before `state->condition.notify_all();`. * Acquire lock again * Add initial value
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #298
Motivation
Currently the
Future
andPromise
are implemented manually by managing conditional variables. However, the conditional variable sometimes behaviors incorrectly on macOS, while the existingfuture
andpromise
from the C++ standard library works well.Modifications
Redesign
Future
andPromise
based on the utilities in the standard<future>
header. In addition, fix the possible race condition whenaddListener
is called aftersetValue
orsetFailed
:setValue
, switch existing listeners and call them one by one out of the lock.addListener
, detectcomplete_
is true and call the listener directly.Now, the previous listeners and the new listener are called concurrently in thread 1 and 2.
Verifications
Run the reproduce code in #298 for 10 times and found it never failed or hang.
Documentation
doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)