Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] PIP-192 Added SplitScheduler and DefaultNamespaceBundleSplitStrategyImpl #19622

Merged
merged 6 commits into from
Mar 14, 2023

Conversation

heesung-sn
Copy link
Contributor

@heesung-sn heesung-sn commented Feb 24, 2023

Master Issue: #16691

Motivation

We will start raising PRs to implement PIP-192, #16691

Modifications

This PR implemented

  • SplitScheduler
  • DefaultNamespaceBundleSplitStrategyImpl
  • SplitManager
  • and their unit test.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • *Added unit tests.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

We will have separate PRs to update the Doc later.

Matching PR in forked repository

PR in forked repository: heesung-sn#30

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 24, 2023
@@ -847,7 +847,6 @@ protected void splitServiceUnitOnceAndRetry(NamespaceService namespaceService,
return null;
});
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 82 to 89
var bundleHighTrafficIterator =
bundleHighTrafficFrequency.entrySet().iterator();
while (bundleHighTrafficIterator.hasNext()) {
String bundle = bundleHighTrafficIterator.next().getKey();
if (!bundleStatsMap.containsKey(bundle)) {
bundleHighTrafficIterator.remove();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var bundleHighTrafficIterator =
bundleHighTrafficFrequency.entrySet().iterator();
while (bundleHighTrafficIterator.hasNext()) {
String bundle = bundleHighTrafficIterator.next().getKey();
if (!bundleStatsMap.containsKey(bundle)) {
bundleHighTrafficIterator.remove();
}
}
bundleHighTrafficFrequency.keySet().retainAll(bundleStatsMap.keySet());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

for (SplitDecision decision : decisions) {
if (decision.getLabel() == Success) {
var split = decision.getSplit();
futures.add(serviceUnitStateChannel.publishSplitEventAsync(split)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait until the split (Received Deleted message) is finished,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I see that UnloadScheduler does not wait for unload completion now.

However, I agree that we better wait for completion (wait on channel.getOwnerAsync) to confirm the completion on the same dedicated thread.

The class name, SplitScheduler would sound counterintuitive if the logic schedules splits and waits for completion. Perhaps, the better name could be SplitManager.

Do we want this class name change and adding the waiting logic in this PR, or a separate PR(also refactoring UnloadScheduler)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue this discussion in your PR, as you have a proposal for this waiting logic in that PR.

@heesung-sn
Copy link
Contributor Author

I will add the waiting logic once this PR gets merged. #19538

}

if (counter.updatedAt() > counterLastUpdatedAt) {
splitMetrics.set(counter.toMetrics(pulsar.getAdvertisedAddress()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use brokerRegistry().getBrokerId()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other metrics code, here we use pulsar.getAdvertisedAddress()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see.

}
}

if (counter.updatedAt() > counterLastUpdatedAt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should update when FutureUtil.waitForAll(futures).whenComplete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to make syncWaiting after FutureUtil.waitForAll .

}
FutureUtil.waitForAll(futures).exceptionally(ex -> {
log.error("Failed to wait for split events to persist.", ex);
counter.update(Failure, Unknown);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already updated when serviceUnitStateChannel.publishSplitEventAsync(split) failure. Here is an unnecessary update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@heesung-sn heesung-sn force-pushed the pip-192-split-shceduler branch from 73be2ba to c856344 Compare March 8, 2023 02:14
}
});
return new InFlightSplitRequest(decision, future);
}).future);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the counter here, because the eventPubFuture might be failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the pub failure handling logic to SplitManager.

.exceptionally(e -> {
log.error("Failed to publish the bundle split event for bundle:{}. Skipping wait.", bundle);
counter.update(Failure, Unknown);
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exceptionally will cause the returned future to lose the exception info. We should use whenComplete here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

var future = inFlightSplitRequest.future;
if (!future.isDone()) {
if (ex != null) {
counter.update(Failure, Unknown);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update the counter here, when complete exception it will updates twice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving the counter update to waitAsync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify how we are updating this failure counter twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see. I think we need to clean this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Updated.

log.warn("Timed out while waiting for the bundle split event: {}", bundle, ex);
}
});
return new InFlightSplitRequest(decision, future);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't need InFlightSplitRequest obj right? Since we already pass the decision in whenComplete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated.

@heesung-sn heesung-sn force-pushed the pip-192-split-shceduler branch from e71ea9f to 1ac2868 Compare March 9, 2023 05:22
@Demogorgon314 Demogorgon314 added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/broker ready-to-test labels Mar 9, 2023
@Demogorgon314 Demogorgon314 reopened this Mar 9, 2023
Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Technoboy-
Copy link
Contributor

/pulsarbot run-failure-checks

@Demogorgon314 Demogorgon314 merged commit 9a85dea into apache:master Mar 14, 2023
@heesung-sn heesung-sn deleted the pip-192-split-shceduler branch April 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs ready-to-test type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants