Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8894: Bump streams test topic deletion assertion timeout from 30s to 60s #7330

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -583,10 +583,10 @@ private void cleanGlobal(final boolean withIntermediateTopics,
private void assertInternalTopicsGotDeleted(final String intermediateUserTopic) throws Exception {
// do not use list topics request, but read from the embedded cluster's zookeeper path directly to confirm
if (intermediateUserTopic != null) {
cluster.waitForRemainingTopics(30000, INPUT_TOPIC, OUTPUT_TOPIC, OUTPUT_TOPIC_2, OUTPUT_TOPIC_2_RERUN,
cluster.waitForRemainingTopics(60000, INPUT_TOPIC, OUTPUT_TOPIC, OUTPUT_TOPIC_2, OUTPUT_TOPIC_2_RERUN,
Topic.GROUP_METADATA_TOPIC_NAME, intermediateUserTopic);
} else {
cluster.waitForRemainingTopics(30000, INPUT_TOPIC, OUTPUT_TOPIC, OUTPUT_TOPIC_2, OUTPUT_TOPIC_2_RERUN,
cluster.waitForRemainingTopics(60000, INPUT_TOPIC, OUTPUT_TOPIC, OUTPUT_TOPIC_2, OUTPUT_TOPIC_2_RERUN,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am always a little "concerned" about bumping timeouts if we don't understand why it actually fails. 30 seconds seems like quite some time.

How long is deleting topics supposed to take? As far as I understand, we send a single request to delete all internal topics via AdminClient to the brokers. Is there a relationship between the expected completion time to delete all topic the the number of topics in the request?

\cc @cmccabe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a relationship to the number of partitions, as far as I'm aware.
I don't believe it is normal to take more than 30 seconds in practice but I can imagine it is possible when the Jenkins workers are overloaded. That's my intuition as to why it failed once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the test and at least one failed test namely testReprocessingFromDateTimeAfterResetWithoutIntermediateUserTopic does not have any internal topics to create, and hence none to delete. So it's not clear if bumping up the timeout would help here.

I'd suggest we first augment the error messages to include the expected topics and the actual topics

Topic.GROUP_METADATA_TOPIC_NAME);
}
}
Expand Down