Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix items in dispatcher.recentlyJoinedConsumers are out-of-order, which may cause a delivery stuck #23802

Merged

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jan 2, 2025

Motivation

Background

  • After all, consumers are off-line, the cursor will be triggered a rewind
    • If there is a pending read is in-progress, the action rewind will be delayed after the pending read is complete.
  • The variable dispatcher.recentlyJoinedConsumers of key_shared mode guarantees delivery in order, it records the cursor.readPosition when consumers join.

Issue: a case that makes items in dispatcher.recentlyJoinedConsumers out-of-order, the steps to reproduce the issue are as follows:

  • there are 3 consumers: c1, c2, c3
    • c1 and c2 consumed all messages that they received.
    • c3 is stuck
    • LAC: 3:299, cursor.readPosition: 3:300, mard-deleted-position: 3:100
  • The dispatcher will replay the messages that were sent to c3 after c3 is removed.
    • At the moment, c1 and c2 are also off-line, which will trigger a rewind
    • Since there is a pending read, the rewind will be delayed.
  • consumer-4 joined, its recentlyJoinedPosition was set to 3:300
  • The delayed rewind was triggered.
    • LAC: 3:299, cursor.readPosition: 3:101, mard-deleted-position: 3:100
  • consumer-5 joined, its recentlyJoinedPosition was set to 3:101
  • The issue occured.

you can reproduce the issue by the new test NonEntryCacheKeySharedSubscriptionV30Test

Modifications

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

…t-of-order, which may cause a delivery stuck
@shibd
Copy link
Member

shibd commented Jan 2, 2025

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.

Project coverage is 74.12%. Comparing base (bbc6224) to head (82b9eea).
Report is 832 commits behind head on master.

Files with missing lines Patch % Lines
...ntStickyKeyDispatcherMultipleConsumersClassic.java 76.92% 1 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23802      +/-   ##
============================================
+ Coverage     73.57%   74.12%   +0.55%     
+ Complexity    32624     2374   -30250     
============================================
  Files          1877     1853      -24     
  Lines        139502   143456    +3954     
  Branches      15299    16291     +992     
============================================
+ Hits         102638   106338    +3700     
+ Misses        28908    28716     -192     
- Partials       7956     8402     +446     
Flag Coverage Δ
inttests 26.67% <0.00%> (+2.08%) ⬆️
systests 23.17% <0.00%> (-1.16%) ⬇️
unittests 73.64% <76.92%> (+0.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ntStickyKeyDispatcherMultipleConsumersClassic.java 88.65% <76.92%> (ø)

... and 1022 files with indirect coverage changes

@codelipenghui
Copy link
Contributor

@poorbarcode Please also help summarize the discussion between us to the PR description. I think it should be added to the Modifications based on your commit 303e562

@poorbarcode
Copy link
Contributor Author

@poorbarcode Please also help summarize the discussion between us to the PR description. I think it should be added to the Modifications based on your commit 303e562

Added

@poorbarcode poorbarcode merged commit 3d71c87 into apache:master Jan 3, 2025
52 checks passed
poorbarcode added a commit that referenced this pull request Jan 3, 2025
…t-of-order, which may cause a delivery stuck (#23802)

(cherry picked from commit 3d71c87)
poorbarcode added a commit that referenced this pull request Jan 3, 2025
…t-of-order, which may cause a delivery stuck (#23802)

(cherry picked from commit 3d71c87)
poorbarcode added a commit that referenced this pull request Jan 3, 2025
…t-of-order, which may cause a delivery stuck (#23802)

(cherry picked from commit 3d71c87)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 7, 2025
…t-of-order, which may cause a delivery stuck (apache#23802)

(cherry picked from commit 3d71c87)
(cherry picked from commit 5cec5ed)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 8, 2025
…t-of-order, which may cause a delivery stuck (apache#23802)

(cherry picked from commit 3d71c87)
(cherry picked from commit 5cec5ed)
poorbarcode added a commit to poorbarcode/pulsar that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants