Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][test] Fix flaky test deleteNamespaceGracefully #18220

Merged
merged 3 commits into from
Nov 24, 2022

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Oct 27, 2022

Fixes #18232

Motivation

https://github.com/poorbarcode/pulsar/actions/runs/3267034877/jobs/5371836130
https://github.com/apache/pulsar/actions/runs/3242148385/jobs/5326408439

The original implementation determines whether to wait for the creation of __chang_event based on whether a broker has taken ownership of the bundle or not. However, there are often 'set namespace policy', 'unload namespace', 'delete namespace', 'split bundle', and other operations in the test case. When the bundle unload and the 'bundle checkconcurrently execute, the methoddeleteNamespaceGraceFully` will run unstably.

Modifications

  • Instead of waiting for __change_evnt to be created, it would be simpler to try again if it failed.
  • change method name deleteNamespaceGraceFully -> deleteNamespaceWithRetry
  • Originally, deleteNamespaceGraceFully in class BrokerTestBase, and then MockedPulsarServiceBaseTest called BrokerTestBase.deleteNamespaceGraceFully(). But MockedPulsarServiceBaseTest is the super class of BrokerTestBase, so move method deleteNamespaceGraceFully to MockedPulsarServiceBaseTest.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@@ -164,7 +164,7 @@ public void resetClusters() throws Exception {
pulsar.getConfiguration().setForceDeleteNamespaceAllowed(true);
for (String tenant : admin.tenants().getTenants()) {
for (String namespace : admin.namespaces().getNamespaces(tenant)) {
deleteNamespaceGraceFullyByMultiPulsars(namespace, true, admin, pulsar,
deleteNamespaceGraceFully(namespace, true, admin, pulsar,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question, can we use the admin.namespaces().deleteNamespace() instead of this?

deleteNamespaceGraceFully looks like a hack operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When admin.namespaces().deleteNamespace() and auto create topic __change_event are concurrent executed, there is a problem #17070, and deleteNamespaceGraceFully is used to solve this problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to fix the flaky test this way: make delete namespace after __change_events is successfully created, but there still has another race condition: the __change_events/__compaction async creates and __change_events delete by namespace delete. Therefore, I will disabled systemTopic in method testDeleteTenant to solve this flaky test.

Do you point?

If right, I suggest we should fix this first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In user use, the probability of occurrence is almost 0. Even if it actually occurs, the user can try to execute the delete namespace again. But to solve this problem fundamentally requires a big change

@poorbarcode poorbarcode force-pushed the flaky/deleteNsGraceFully branch from 05212ce to dce5599 Compare November 1, 2022 18:41
@poorbarcode poorbarcode force-pushed the flaky/deleteNsGraceFully branch from dce5599 to d0d9f3f Compare November 10, 2022 13:39
@poorbarcode poorbarcode force-pushed the flaky/deleteNsGraceFully branch 2 times, most recently from 8dc781c to d740815 Compare November 18, 2022 13:06
}

/**
* Wait until system topic "__change_event" and subscription "__compaction" are created, and then delete the namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the description need to be modified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed.

canPausedNamespaceService.pause();
}

Awaitility.await().until(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add the most wait time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed.

@Technoboy- Technoboy- added this to the 2.12.0 milestone Nov 21, 2022
@Technoboy- Technoboy- closed this Nov 21, 2022
@Technoboy- Technoboy- reopened this Nov 21, 2022
@Technoboy- Technoboy- changed the title [fix][test]fix flaky test deleteNamespaceGracefully [fix][test] Fix flaky test deleteNamespaceGracefully Nov 21, 2022
@poorbarcode poorbarcode force-pushed the flaky/deleteNsGraceFully branch 2 times, most recently from ebec5a7 to 882cd56 Compare November 21, 2022 08:58
@nodece
Copy link
Member

nodece commented Nov 22, 2022

@poorbarcode Could you fix these conflicts?

@poorbarcode poorbarcode force-pushed the flaky/deleteNsGraceFully branch from 882cd56 to d442374 Compare November 23, 2022 15:56
@codecov-commenter
Copy link

codecov-commenter commented Nov 23, 2022

Codecov Report

Merging #18220 (801b233) into master (cd85a67) will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18220      +/-   ##
============================================
- Coverage     47.39%   47.35%   -0.05%     
- Complexity    10479    10483       +4     
============================================
  Files           698      698              
  Lines         68070    68070              
  Branches       7279     7279              
============================================
- Hits          32264    32235      -29     
- Misses        32228    32255      +27     
- Partials       3578     3580       +2     
Flag Coverage Δ
unittests 47.35% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../pulsar/broker/service/SharedConsumerAssignor.java 3.70% <0.00%> (-74.08%) ⬇️
...apache/pulsar/broker/service/EntryAndMetadata.java 0.00% <0.00%> (-40.75%) ⬇️
.../apache/pulsar/broker/loadbalance/LoadManager.java 61.11% <0.00%> (-16.67%) ⬇️
...sistent/PersistentDispatcherMultipleConsumers.java 51.55% <0.00%> (-7.96%) ⬇️
...pulsar/broker/service/PulsarCommandSenderImpl.java 73.84% <0.00%> (-4.62%) ⬇️
.../pulsar/broker/service/BrokerServiceException.java 42.59% <0.00%> (-3.71%) ⬇️
...tent/PersistentDispatcherSingleActiveConsumer.java 55.17% <0.00%> (-3.14%) ⬇️
...oker/service/schema/SchemaRegistryServiceImpl.java 60.66% <0.00%> (-2.71%) ⬇️
...g/apache/pulsar/broker/admin/impl/BrokersBase.java 60.45% <0.00%> (-1.37%) ⬇️
...apache/pulsar/proxy/server/LookupProxyHandler.java 56.46% <0.00%> (-1.30%) ⬇️
... and 36 more

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

1 similar comment
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

1 similar comment
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@poorbarcode
Copy link
Contributor Author

@nodece

Could you fix these conflicts?

Already fixed.

@nodece nodece merged commit c544ea3 into apache:master Nov 24, 2022
@poorbarcode poorbarcode deleted the flaky/deleteNsGraceFully branch November 24, 2022 09:57
lifepuzzlefun pushed a commit to lifepuzzlefun/pulsar that referenced this pull request Dec 9, 2022
lifepuzzlefun pushed a commit to lifepuzzlefun/pulsar that referenced this pull request Jan 10, 2023
liangyepianzhou pushed a commit that referenced this pull request Feb 25, 2023
liangyepianzhou added a commit that referenced this pull request Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky-test: AdminApi2Test.resetClusters
5 participants