-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix deadlock in PendingAckHandleImpl #18989
Conversation
...src/main/java/org/apache/pulsar/broker/transaction/pendingack/impl/PendingAckHandleImpl.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pulsar/broker/transaction/pendingack/impl/PendingAckHandleImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please be sure to update the PR description since the code no longer aligns with it.
...src/main/java/org/apache/pulsar/broker/transaction/pendingack/impl/PendingAckHandleImpl.java
Show resolved
Hide resolved
if (recoverTime.getRecoverStartTime() != 0L && recoverTime.getRecoverEndTime() == 0L) { | ||
recoverTime.setRecoverEndTime(System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the synchronized
keyword in the method declaration, this conditional update to recoveredEndTime
is subject to data races, and the variables will not be safely published to other threads. However, at this time, the recoverTime
object is only used for stats, and we have a tendency to have relaxed requirements for stats. Perhaps it is fine. We could alternatively add methods to the class that ensure proper synchronization of updates. I don't feel strongly, except to mention that it is somewhat brittle to assume the variable will only ever be used by stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right. This recoveryTime
registers the time of the replay. It should be handled directly in the callback after the replay is completed. For the current usage, being non thread-safe is fine and I'd prefer leave it as is.
Those methods don't need synchronization at all since, at the moment, there's no data race for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Codecov Report
@@ Coverage Diff @@
## master #18989 +/- ##
============================================
+ Coverage 46.35% 46.72% +0.36%
- Complexity 8939 10525 +1586
============================================
Files 597 706 +109
Lines 56858 69004 +12146
Branches 5905 7392 +1487
============================================
+ Hits 26357 32241 +5884
- Misses 27616 33173 +5557
- Partials 2885 3590 +705
Flags with carried forward coverage won't be shown. Click here to find out more.
|
(cherry picked from commit 22866bd)
(cherry picked from commit 22866bd)
Fixes #18988
Motivation
Based on stacktrace in the issue, in
PendingAckHandleImpl
, if the first timependingAckHandleCompletableFuture
is completed via the methodaddConsumer
thePendingAckHandleImpl
object is locked to the current thread because the methodcompleteHandleFuture
is synchronized. TheaddConsumer
then locks thePersistentSubscription
. In the reverse order the subscription closing procedure locks thePersistentSubscription
object and thenPendingAckHandleImpl
.It's possible that if they run concurrently it ends up in a deadlock
Modifications
completeHandleFuture
methods. In this way thePendingAckHandleImpl
object isn't locked in the stages chain.I haven't added tests since I wasn't able to reproduce it programmatically.
Verifying this change
Documentation
doc
doc-required
doc-not-needed
doc-complete