-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][test] Fix flaky testCreateTopicWithZombieReplicatorCursor #20037
[fix][test] Fix flaky testCreateTopicWithZombieReplicatorCursor #20037
Conversation
Example logs of updating replication clusters:
We can see the interval is only 4 milliseconds but they happened in different threads ( |
@@ -1542,7 +1542,13 @@ public CompletableFuture<Void> checkReplication() { | |||
continue; | |||
} | |||
if (!replicators.containsKey(cluster)) { | |||
futures.add(startReplicator(cluster)); | |||
futures.add(checkReplicationCluster(cluster).thenCompose(clusterExists -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method checkReplicationCluster
is guarantee that the remote cluster is still in the collection topicPolicies
. But the line-1524
and line-1540
also did this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because `onPoliciesUpdate` is asynchronous, while `testCreateTopicWithZombieReplicatorCursor` updates the namespace policy nearly the same time, so there is a race with the order of updating `AbstractTopic#topicPolicies`. Sometimes the policies update might fail because the topic might be deleted in `PersistentTopic#checkReplication`: > Deleting topic [xxx] because local cluster is not part of global namespace repl list [remote] ### Modifications - Sleep 100ms between two calls of updating the replication clusters - Add the local cluster to the replication cluster list - Add the retry logic for `initialize`
f8a5806
to
bb254f1
Compare
Codecov Report
@@ Coverage Diff @@
## master #20037 +/- ##
=============================================
+ Coverage 34.65% 72.94% +38.28%
- Complexity 12429 31840 +19411
=============================================
Files 1606 1865 +259
Lines 125026 138040 +13014
Branches 13667 15167 +1500
=============================================
+ Hits 43332 100693 +57361
+ Misses 76081 29364 -46717
- Partials 5613 7983 +2370
Flags with carried forward coverage won't be shown. Click here to find out more.
|
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
(cherry picked from commit f2076b4)
Fixes #20010
Motivation
PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor
is flaky becauseonPoliciesUpdate
is asynchronous, whiletestCreateTopicWithZombieReplicatorCursor
updates the namespace policy nearly the same time, so there is a race with the order of updatingAbstractTopic#topicPolicies
.Sometimes the policies update might fail because the topic might be deleted in
PersistentTopic#checkReplication
:Modifications
initialize
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: BewareMyPower#24