Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Redelivering messages doesn't take dispatcherMaxReadSizeBytes into account in Shared and Key_Shared subscriptions #23505

Open
2 of 3 tasks
lhotari opened this issue Oct 23, 2024 · 7 comments
Assignees
Labels
type/bug The PR fixed a bug or issue reported a bug
Milestone

Comments

@lhotari
Copy link
Member

lhotari commented Oct 23, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

all released versions

Minimal reproduce step

Problem description:
In the Shared subscription, messages get added to the replay queue when a consumer disconnects. In Key_Shared subscription, the replay queue is also used when messages cannot be dispatched to a target consumer due to insufficient permits or when the hash is blocked.
There's a problem in the current implementation since the dispatcherMaxReadSizeBytes (default 5MB) setting isn't taken into account in the reads. The impact of this is that consumers will receive a large batch messages at once if the reads succeed. The exact implication of this isn't fully known at this time. However, it's against the design to ignore the dispatcherMaxReadSizeBytes setting which is helpful in making smaller incremental progress on individual dispatchers and service all active dispatchers in the broker one-by-one as fairly as possible.

What did you expect to see?

That dispatcherMaxReadSizeBytes is used for redelivering message.

What did you see instead?

dispatcherMaxReadSizeBytes is ignored.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari added the type/bug The PR fixed a bug or issue reported a bug label Oct 23, 2024
@lhotari lhotari added this to the 4.1.0 milestone Oct 23, 2024
@lhotari lhotari self-assigned this Oct 23, 2024
@ZhaoGuorui666
Copy link
Contributor

Hi lhotari,

I have been looking into this issue today and noticed that you have self-assigned it. I would like to discuss the following points with you:

Where is the dispatcherMaxReadSizeBytes limit applied in the code?

  • Before inserting into the replay queue?
    image

  • Within the readMoreEntries() method? (However, I noticed that the calculateToRead() method already imposes a limit at the beginning)
    image

Looking forward to your PR and learning from your approach to resolving this issue.

Thank you!

@lhotari
Copy link
Member Author

lhotari commented Oct 23, 2024

Hi lhotari,

I have been looking into this issue today and noticed that you have self-assigned it. I would like to discuss the following points with you:

Where is the dispatcherMaxReadSizeBytes limit applied in the code?

  • Before inserting into the replay queue?
    image

  • Within the readMoreEntries() method? (However, I noticed that the calculateToRead() method already imposes a limit at the beginning)
    image

Looking forward to your PR and learning from your approach to resolving this issue.

Thank you!

@ZhaoGuorui666 'll share details, possibly tomorrow. Just curious, are you facing this issue?

@ZhaoGuorui666
Copy link
Contributor

Hi lhotari,
I have been looking into this issue today and noticed that you have self-assigned it. I would like to discuss the following points with you:
Where is the dispatcherMaxReadSizeBytes limit applied in the code?

  • Before inserting into the replay queue?
    image
  • Within the readMoreEntries() method? (However, I noticed that the calculateToRead() method already imposes a limit at the beginning)
    image

Looking forward to your PR and learning from your approach to resolving this issue.
Thank you!

@ZhaoGuorui666 'll share details, possibly tomorrow. Just curious, are you facing this issue?

No, I'm just trying to solve an issue and want to contribute to Pulsar. I'm just starting out now, I'll take a look at your previous PRs and learn from them.

@lhotari
Copy link
Member Author

lhotari commented Oct 23, 2024

No, I'm just trying to solve an issue and want to contribute to Pulsar. I'm just starting out now, I'll take a look at your previous PRs and learn from them.

@ZhaoGuorui666 One good way to start valuable contributions is to fix flaky tests. We have plenty of them: https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+flaky . In many cases, there could also be a production code issue that is causing the flakiness. You'll learn a lot of Pulsar while addressing flaky tests too. Usually you can reproduce a flaky test by temporarily replacing @Test with @Test(invocationCount = 20) so that the same method gets run multiple times. In certain cases, other methods are required for reproducing locally.

@lhotari
Copy link
Member Author

lhotari commented Oct 23, 2024

@ZhaoGuorui666 for priority of flaky tests, you can check one of the recent reports in https://github.com/lhotari/pulsar-flakes. I triggered a new runs to get a reports of the most flaky tests in the last 2 weeks (runs: https://github.com/lhotari/pulsar-flakes/actions).

@lhotari
Copy link
Member Author

lhotari commented Oct 23, 2024

@ZhaoGuorui666
Copy link
Contributor

No, I'm just trying to solve an issue and want to contribute to Pulsar. I'm just starting out now, I'll take a look at your previous PRs and learn from them.

@ZhaoGuorui666 One good way to start valuable contributions is to fix flaky tests. We have plenty of them: https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+flaky . In many cases, there could also be a production code issue that is causing the flakiness. You'll learn a lot of Pulsar while addressing flaky tests too. Usually you can reproduce a flaky test by temporarily replacing @Test with @Test(invocationCount = 20) so that the same method gets run multiple times. In certain cases, other methods are required for reproducing locally.

Thank you for your suggestion. This method sounds very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants