-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] zk EventThread get block after zk leader stop #18269
Comments
Similar with #16712 |
Update:
class Scratch {
private static final CountDownLatch latch = new CountDownLatch(1);
public static void main(String[] args) throws InterruptedException {
ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
Scratch s = new Scratch();
scheduler.scheduleAtFixedRate(s::checkState, 2, 2, TimeUnit.SECONDS);
latch.await();
while (true) {
s.process();
}
}
private void process() {
checkState();
}
private synchronized void checkState() {
System.out.println(Thread.currentThread().getName() + " checking state");
if (latch.getCount() > 0) {
latch.countDown();
}
CompletableFuture<Void> future = new CompletableFuture<>();
try {
future.get(2, TimeUnit.SECONDS);
} catch (TimeoutException ignore) {
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(Thread.currentThread().getName() + " finished");
}
} Output:
|
I think this bug should be resolved at #17909 also. It seems that the reported version is 2.8.1, while the fix is picked to only after 2.9.x. @codelipenghui do we have a released version including the fix that @Shawyeok can try to use? |
I've talked with @codelipenghui, I'll pick that PR into our release branch in my org. Thanks again for point this out @codelipenghui @tisonkun, I'm going close this one. |
Search before asking
Version
Minimal reproduce step
Setup a pulsar cluster and zk cluster, could get a chance to reproduce (not always) after stop zk leader
What did you expect to see?
zk client should reconnect another zk node properly,
What did you see instead?
main-EventThread
is blocking onorg.apache.pulsar.metadata.impl.ZKSessionWatcher#process
method.main-EventThread
log:metadata-store-zk-session-watcher-7-1
log:org.apache.pulsar.metadata.impl.ZKSessionWatcher#currentStatus
isSessionLost
zk EventThread internal task queue
waitingEvents
has many events to processThe keeperState of current event that zk EventThread processing is
SyncConnected
It seems that zk exists operation will always timeout cause zk EventThread is blocking and
metadata-store-zk-session-watcher
thread can always acquire lock (I don't understand why, a jvm bug?).pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKSessionWatcher.java
Lines 68 to 71 in 0866c3a
pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKSessionWatcher.java
Lines 86 to 108 in 0866c3a
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: