-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky-test: PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor #20010
Closed
1 of 2 tasks
lhotari opened this issue
Apr 4, 2023
· 3 comments
· Fixed by #20037 · May be fixed by BewareMyPower/pulsar#24
Closed
1 of 2 tasks
Flaky-test: PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor #20010
lhotari opened this issue
Apr 4, 2023
· 3 comments
· Fixed by #20037 · May be fixed by BewareMyPower/pulsar#24
Labels
Comments
2 tasks
|
Merged
4 tasks
@BewareMyPower Do you have a chance to fix this flaky test that was introduced by your PR #19972? thanks |
I pushed a PR #20025 to fix this flaky test, and please take a look. Thanks |
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 6, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` ### Modifications - Call `checkReplicationCluster` before calling `startReplicator`. - Support retrying `initialize` to see if retry works.
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 6, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` ### Modifications - Call `checkReplicationCluster` before calling `startReplicator`. - Support retrying `initialize` to see if retry works. - Check replication cluster before creating the replication client
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 6, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` ### Modifications Call `checkReplicationCluster` before calling `startReplicator`. Since there is still a rare chance that the cluster data is empty when the cluster still exists, return null instead of throwing a runtime exception, then skip creating the replication client. Use `Awaitility` to check if the cursor has been deleted eventually.
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 7, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` ### Modifications Call `checkReplicationCluster` before calling `startReplicator`.
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 7, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` ### Modifications - Call `checkReplicationCluster` before calling `startReplicator`. - Sleep for a while in the test to reduce the flakiness caused by the asynchronous update of the policies
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 7, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because the cursor could still be created again in `startReplicator`, which could be called by: ``` onPoliciesUpdate checkReplicationAndRetryOnFailure checkReplication ``` Sometimes the policies update might fail because the topic might be deleted in `PersistentTopic#checkReplication`: > Deleting topic [xxx] because local cluster is not part of global namespace repl list [remote] ### Modifications - Call `checkReplicationCluster` before calling `startReplicator`. - Add the local cluster to the replication cluster list - Sleep for a while in the test to reduce the flakiness caused by the asynchronous update of the policies
4 tasks
BewareMyPower
added a commit
to BewareMyPower/pulsar
that referenced
this issue
Apr 11, 2023
Fixes apache#20010 ### Motivation `PersistentTopicTest.testCreateTopicWithZombieReplicatorCursor` is flaky because `onPoliciesUpdate` is asynchronous, while `testCreateTopicWithZombieReplicatorCursor` updates the namespace policy nearly the same time, so there is a race with the order of updating `AbstractTopic#topicPolicies`. Sometimes the policies update might fail because the topic might be deleted in `PersistentTopic#checkReplication`: > Deleting topic [xxx] because local cluster is not part of global namespace repl list [remote] ### Modifications - Sleep 100ms between two calls of updating the replication clusters - Add the local cluster to the replication cluster list - Add the retry logic for `initialize`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Search before asking
Example failure
https://github.com/apache/pulsar/actions/runs/4604351215/jobs/8140558256?pr=20005#step:11:1177
Exception stacktrace
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: