[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

danpi · 2024-12-20T08:27:50Z

Search before asking

I searched in the issues and found nothing similar.

Read release policy

I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS：centos7
Jdk：17
Pulsar version：3.0.7

Minimal reproduce step

Add a test case to the testGetMessageById method in PersistentTopicsTest.java.
Specifically, you can add the following code:
Assert.expectThrows(PulsarAdminException.ServerSideErrorException.class, () -> { admin.topics().getMessageById(topicName1, id1.getLedgerId(), id1.getEntryId() + 10); });
Run this test case to reproduce the issue. You will encounter the following error:
Caused by: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:347) at org.apache.pulsar.client.admin.internal.TopicsImpl.getMessageById(TopicsImpl.java:1010) at org.apache.pulsar.broker.admin.PersistentTopicsTest.lambda$testGetMessageById$11(PersistentTopicsTest.java:1385) at org.testng.Assert.expectThrows(Assert.java:2440) ... 29 more

What did you expect to see?

The issue occurs when trying to query a non-existent message, which usually happens when a topic is newly created but hasn't received any traffic yet. In such cases, querying some information about the topic might invoke this API, leading to a timeout.

For this scenario, I would expect a fast failure, rather than being blocked until the timeout occurs.

What did you see instead?

What I observed instead is that the getMessageById request gets blocked until the timeout occurs.

The hidden risk is that, since the timeout duration is uncertain, if the user has not configured a timeout (e.g., PulsarAdmin.builder().readTimeout(5, TimeUnit.SECONDS);) or if the timeout configuration is unreasonable, it can cause the TCP connection to enter a CLOSE_WAIT state. In extreme cases, this could potentially lead to a tcp.listenOverflow, which can affect other functionalities.

The following image shows a large number of connections in the CLOSE_WAIT state on the broker's 8080 port:

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

danpi added the type/bug The PR fixed a bug or issue reported a bug label Dec 20, 2024

danpi mentioned this issue Dec 20, 2024

[fix][admin] Fix exception loss in getMessageId method #23766

Merged

15 tasks

lhotari closed this as completed in #23766 Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

danpi commented Dec 20, 2024

[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

Comments

danpi commented Dec 20, 2024

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?