Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][flaky-test]AdminApi2Test.testDeleteTenant #17070

Merged
merged 3 commits into from
Aug 12, 2022

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Aug 11, 2022

Fixes #17050

Master Issue:

Motivation

If enabled system topic, when a topic is deleted, these events are triggered serially:

  • delete topic
  • check topic ownership
  • load bundle
  • trigger event: NamespaceBundleOwnershipListener.onLoad()
  • init system topic client
  • subscribe system topic
  • (High loght)create topic: __change_events

Then the exception "Directory not empty" occurs when deleting the namespace.

admin.topics().deletePartitionedTopic(topic);
assertTrue(admin.topics().getList(namespace).isEmpty());
// delete namespace
admin.namespaces().deleteNamespace(namespace, false);

Even if will try to delete topics that type system when deleting namespace, there is still the problem of multi-thread concurrency.

if (!topics.isEmpty()) {
for (String topic : topics) {
try {
futures.add(pulsar().getAdminClient().topics().deleteAsync(topic, true, true));
} catch (Exception ex) {

(High light) I think this is also a bug

when the user manually creates the topic but does not use it, and then quickly deletes the namespace, there is a small probability left topic __change_events

I try to fix the flaky test this way: make delete namespace after __change_events is successfully created, but there still has another race condition: the __change_events/__compaction async creates and __change_events delete by namespace delete. Therefore, I will disabled systemTopic in method testDeleteTenant to solve this flaky test.

Modifications

  • This PR only fixed the flaky test: disabled systemTopic in method testDeleteTenant.

  • As for how to solve the bug, I think we need to discuss

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 11, 2022
TreeSet<String> topics = new TreeSet<>();
topics.addAll(pulsar.getNamespaceService()
.getListOfPersistentTopics(NamespaceName.get(namespace)).join());
topics.addAll(pulsar.getPulsarResources().getNamespaceResources().getPartitionedTopicResources()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think query getListOfPersistentTopics is enough, because __change_events is persistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't need to check partitioned topics? Because we will eventually create the partitons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Technoboy-

I think query getListOfPersistentTopics is enough, because __change_events is persistent.

already remove the logic that queries non-persistent topic

Hi @codelipenghui

And we don't need to check partitioned topics? Because we will eventually create the partitons.

already add the partitioned topics check.

@codelipenghui
Copy link
Contributor

when the user manually creates the topic but does not use it, and then quickly deletes the namespace, there is a small probability left topic __change_events

Here looks like a race condition? The system topic created after the namespace has been deleted?

@poorbarcode
Copy link
Contributor Author

Here looks like a race condition? The system topic created after the namespace has been deleted?

both correct.

@Technoboy-
Copy link
Contributor

when the user manually creates the topic but does not use it, and then quickly deletes the namespace, there is a small probability left topic __change_events

Here looks like a race condition? The system topic created after the namespace has been deleted?

When deleting the partitioned topics, we have to check the ownership first and find that the bundle is not owned and then try to own the bundle and then trigger the system topic created async.

@poorbarcode
Copy link
Contributor Author

poorbarcode commented Aug 12, 2022

I try to fix the flaky test this way: make delete namespace after __change_events is successfully created, but there still has another race condition: the __change_events/__compaction async creates and __change_events delete by namespace delete. Therefore, I will disable systemTopic in method testDeleteTenant to solve this flaky test.

@Technoboy- @codelipenghui Could you review this PR again?

@Technoboy-
Copy link
Contributor

when the user manually creates the topic but does not use it, and then quickly deletes the namespace, there is a small probability left topic __change_events

Here looks like a race condition? The system topic created after the namespace has been deleted?

When deleting the partitioned topics, we have to check the ownership first and find that the bundle is not owned and then try to own the bundle and then trigger the system topic created async.

I come up with a new idea for deleting partitioned topics, we can set the LookupOptions.readOnly(true) when validateTopicOwnershipAsync to avoid loading the bundle.

@codelipenghui codelipenghui merged commit 89e82a2 into apache:master Aug 12, 2022
Technoboy- pushed a commit to Technoboy-/pulsar that referenced this pull request Aug 14, 2022
Technoboy- pushed a commit to merlimat/pulsar that referenced this pull request Aug 16, 2022
@poorbarcode poorbarcode deleted the flaky/deleteTenant branch September 17, 2022 02:45
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs type/flaky-tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky-test: AdminApi2Test.testDeleteTenant
3 participants