Fix retry logic in BrokerClient
and flakey BrokerClientTest
#16618
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BrokerClientTest
was added in PR #14322. When running this test from the IDE (outside of Maven),testError()
reliably fails. However, it passes in CI due to the Maven setting<surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount>
in the pom.xml. This setting allows Maven to retry the test up to three times, and the test ultimately passes because of the retry logic in the code, but maven treats it as "flaky".Changes
Looking into the error handling in
BrokerClient
, it was found that the retry logic doesn't effectively handle 5xxDruidException
categories since the HTTP error codes are converted intoDruidException
s. So the retry logic didn't fully account transient errors and this patch fixes that. We could perhaps change the semantics of the method to just use the HTTP response, or write a service client that uses the built-in mechanisms, but it'll be a larger change.Also, I removed the code that converts all non-200 error codes into retryable 5xx DruidException categories. For example, 4xx error codes shouldn't be retried. If there are additional 500 error codes we need to retry, we can add them explicitly similar to the ones that are currently handled.
This PR has: