Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TimeoutException encountered while accessing the admin's getMessageById API in version 3.0.7 #23765

Closed
3 tasks done
danpi opened this issue Dec 20, 2024 · 0 comments · Fixed by #23766
Closed
3 tasks done
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@danpi
Copy link
Contributor

danpi commented Dec 20, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

OS:centos7
Jdk:17
Pulsar version:3.0.7

Minimal reproduce step

  1. Add a test case to the testGetMessageById method in PersistentTopicsTest.java.

  2. Specifically, you can add the following code:
    Assert.expectThrows(PulsarAdminException.ServerSideErrorException.class, () -> { admin.topics().getMessageById(topicName1, id1.getLedgerId(), id1.getEntryId() + 10); });

  3. Run this test case to reproduce the issue. You will encounter the following error:
    Caused by: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:347) at org.apache.pulsar.client.admin.internal.TopicsImpl.getMessageById(TopicsImpl.java:1010) at org.apache.pulsar.broker.admin.PersistentTopicsTest.lambda$testGetMessageById$11(PersistentTopicsTest.java:1385) at org.testng.Assert.expectThrows(Assert.java:2440) ... 29 more

What did you expect to see?

The issue occurs when trying to query a non-existent message, which usually happens when a topic is newly created but hasn't received any traffic yet. In such cases, querying some information about the topic might invoke this API, leading to a timeout.

For this scenario, I would expect a fast failure, rather than being blocked until the timeout occurs.

What did you see instead?

What I observed instead is that the getMessageById request gets blocked until the timeout occurs.

The hidden risk is that, since the timeout duration is uncertain, if the user has not configured a timeout (e.g., PulsarAdmin.builder().readTimeout(5, TimeUnit.SECONDS);) or if the timeout configuration is unreasonable, it can cause the TCP connection to enter a CLOSE_WAIT state. In extreme cases, this could potentially lead to a tcp.listenOverflow, which can affect other functionalities.

The following image shows a large number of connections in the CLOSE_WAIT state on the broker's 8080 port:
image

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@danpi danpi added the type/bug The PR fixed a bug or issue reported a bug label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant