[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765
Closed
3 tasks done
Labels
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
OS:centos7
Jdk:17
Pulsar version:3.0.7
Minimal reproduce step
Add a test case to the testGetMessageById method in PersistentTopicsTest.java.
Specifically, you can add the following code:
Assert.expectThrows(PulsarAdminException.ServerSideErrorException.class, () -> { admin.topics().getMessageById(topicName1, id1.getLedgerId(), id1.getEntryId() + 10); });
Run this test case to reproduce the issue. You will encounter the following error:
Caused by: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:347) at org.apache.pulsar.client.admin.internal.TopicsImpl.getMessageById(TopicsImpl.java:1010) at org.apache.pulsar.broker.admin.PersistentTopicsTest.lambda$testGetMessageById$11(PersistentTopicsTest.java:1385) at org.testng.Assert.expectThrows(Assert.java:2440) ... 29 more
What did you expect to see?
The issue occurs when trying to query a non-existent message, which usually happens when a topic is newly created but hasn't received any traffic yet. In such cases, querying some information about the topic might invoke this API, leading to a timeout.
For this scenario, I would expect a fast failure, rather than being blocked until the timeout occurs.
What did you see instead?
What I observed instead is that the getMessageById request gets blocked until the timeout occurs.
The hidden risk is that, since the timeout duration is uncertain, if the user has not configured a timeout (e.g., PulsarAdmin.builder().readTimeout(5, TimeUnit.SECONDS);) or if the timeout configuration is unreasonable, it can cause the TCP connection to enter a CLOSE_WAIT state. In extreme cases, this could potentially lead to a tcp.listenOverflow, which can affect other functionalities.
The following image shows a large number of connections in the CLOSE_WAIT state on the broker's 8080 port:
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: