Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix future of clear delayed message can't complete #20075

Merged
merged 4 commits into from
Apr 15, 2023

Conversation

coderzc
Copy link
Member

@coderzc coderzc commented Apr 12, 2023

PIP: #16763

Motivation

If newTracker() throws an exception, then the future will can't complete forever, this will lead to the topic can't be deleted until restart broker.

image

image

Modifications

Use cleanResidualSnapshots instead of creating a tracker to clear residual snapshots if the tracker does not exist.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

clear delayed message  uncomplete
@coderzc coderzc self-assigned this Apr 12, 2023
@coderzc coderzc added type/bug The PR fixed a bug or issue reported a bug area/broker labels Apr 12, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 12, 2023
@coderzc coderzc added this to the 3.0.0 milestone Apr 12, 2023
@coderzc coderzc added ready-to-test and removed doc-not-needed Your PR changes do not impact docs labels Apr 12, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 12, 2023
@coderzc coderzc closed this Apr 12, 2023
@coderzc coderzc reopened this Apr 12, 2023
@cbornet
Copy link
Contributor

cbornet commented Apr 12, 2023

@coderzc we are now in code freeze for 3.0. So unless this is a blocker for the release, this change should target 3.0.1/3.1. Please change the milestone again to 3.0 if you believe that's a blocker.

@cbornet cbornet modified the milestones: 3.0.0, 3.1.0 Apr 12, 2023
@coderzc
Copy link
Member Author

coderzc commented Apr 12, 2023

@coderzc we are now in code freeze for 3.0. So unless this is a blocker for the release, this change should target 3.0.1/3.1. Please change the milestone again to 3.0 if you believe that's a blocker.

This bug will lead to a topic can't be deleted until restart broker, so I think it is a blocker.

Copy link
Contributor

@poorbarcode poorbarcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test?

try {
if (delayedDeliveryTracker.isEmpty()) {
delayedDeliveryTracker = Optional
.of(topic.getBrokerService().getDelayedDeliveryTrackerFactory().newTracker(this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we never touch the method trackDelayedDelivery(), then the variable delayedDeliveryTracker is always is empty, right?

If yes, why need we initialize it when we should clear delayed messages? Can we just return a completed future?

Copy link
Member Author

@coderzc coderzc Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poorbarcode Because the dispatcher was closed before unsubscribe was called, this lead to the tracker reference being lost, so we need to reinitialize the tracker to clean up residual persistent data.

Please see: line-1320、line-1356

subscriptions.forEach((s, sub) -> futures.add(sub.disconnect()));
if (closeIfClientsConnected) {
replicators.forEach((cluster, replicator) -> futures.add(replicator.disconnect()));
shadowReplicators.forEach((__, replicator) -> futures.add(replicator.disconnect()));
producers.values().forEach(producer -> futures.add(producer.disconnect()));
}
FutureUtil.waitForAll(futures).thenRunAsync(() -> {
closeClientFuture.complete(null);
}, getOrderedExecutor()).exceptionally(ex -> {
log.error("[{}] Error closing clients", topic, ex);
unfenceTopicToResume();
closeClientFuture.completeExceptionally(ex);
return null;
});
closeClientFuture.thenAccept(__ -> {
CompletableFuture<Void> deleteTopicAuthenticationFuture = new CompletableFuture<>();
brokerService.deleteTopicAuthenticationWithRetry(topic, deleteTopicAuthenticationFuture, 5);
deleteTopicAuthenticationFuture.thenCompose(ignore -> deleteSchema())
.thenCompose(ignore -> {
if (!SystemTopicNames.isTopicPoliciesSystemTopic(topic)
&& brokerService.getPulsar().getConfiguration().isSystemTopicEnabled()) {
return deleteTopicPolicies();
} else {
return CompletableFuture.completedFuture(null);
}
})
.thenCompose(ignore -> transactionBufferCleanupAndClose())
.whenComplete((v, ex) -> {
if (ex != null) {
log.error("[{}] Error deleting topic", topic, ex);
unfenceTopicToResume();
deleteFuture.completeExceptionally(ex);
} else {
List<CompletableFuture<Void>> subsDeleteFutures = new ArrayList<>();
subscriptions.forEach((sub, p) -> subsDeleteFutures.add(unsubscribe(sub)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the dispatcher was closed before unsubscribe was called, this lead to the tracker reference being lost, so we need to reinitialize the tracker to clean up residual persistent data.

I see. Another question: the method new tracker will call recover internally, right? If yes, I feel it is an expensive task. Can we only set the Tracker to an unavailable state when closing the dispatcher to avoid recover the second time?

Copy link
Member Author

@coderzc coderzc Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I think we should use another way to clean up data to avoid unnecessary recover, such as add a clean method in the factory.

@coderzc coderzc requested a review from poorbarcode April 12, 2023 16:35
@coderzc
Copy link
Member Author

coderzc commented Apr 13, 2023

@poorbarcode I use cleanResidualSnapshots instead of create a tracker to clear residual snapshots if the tracker does not exist, please take a look again.

@coderzc coderzc force-pushed the fix_clear_delayed_message branch from 19ff631 to ad34c39 Compare April 13, 2023 05:41
@codecov-commenter
Copy link

codecov-commenter commented Apr 13, 2023

Codecov Report

Merging #20075 (4220f69) into master (a332fea) will increase coverage by 37.34%.
The diff coverage is 84.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20075       +/-   ##
=============================================
+ Coverage     35.58%   72.93%   +37.34%     
- Complexity    12423    31901    +19478     
=============================================
  Files          1691     1868      +177     
  Lines        128771   138309     +9538     
  Branches      14044    15211     +1167     
=============================================
+ Hits          45822   100874    +55052     
+ Misses        76931    29422    -47509     
- Partials       6018     8013     +1995     
Flag Coverage Δ
inttests 24.29% <0.00%> (?)
systests 24.88% <0.00%> (-0.02%) ⬇️
unittests 72.23% <84.00%> (+39.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/apache/pulsar/broker/delayed/bucket/Bucket.java 88.70% <ø> (+88.70%) ⬆️
...r/delayed/bucket/BucketDelayedDeliveryTracker.java 82.90% <ø> (+82.90%) ⬆️
...sar/broker/service/persistent/PersistentTopic.java 79.39% <70.00%> (+28.57%) ⬆️
...r/delayed/BucketDelayedDeliveryTrackerFactory.java 93.54% <90.90%> (+93.54%) ⬆️
...sistent/PersistentDispatcherMultipleConsumers.java 75.83% <100.00%> (+23.42%) ⬆️

... and 1438 files with indirect coverage changes

@coderzc coderzc modified the milestones: 3.1.0, 3.0.0 Apr 14, 2023
@mattisonchao mattisonchao merged commit 6152cc8 into apache:master Apr 15, 2023
coderzc added a commit that referenced this pull request Apr 17, 2023
@coderzc coderzc changed the title [fix][broker] Fix avoid future of clear delayed message can't complete [fix][broker] Fix future of clear delayed message can't complete Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs ready-to-test type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants