-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-17661: Fix flaky BufferPoolTest.testBlockTimeout #17319
KAFKA-17661: Fix flaky BufferPoolTest.testBlockTimeout #17319
Conversation
Another possible solution is to remove the asynchronous delayedDeallocate() in testBlockTimeout. |
In fact, this solution looks good to me. We can have a follow-up to add multi-threaded tests for deallocate and allocate, but I don't think this test case requires asynchronous operations. |
@chia7712 That make sense to me. I will rewrite this PR. |
ffad249
to
4dc55ca
Compare
Hi @chia7712, After rethink about it, I believe we can resolve the racing issue while still keeping the multi-threading test logic by removing the third delayed deallocation. The root casue of racing is all the three delayed deallocation threads occur before the test thread. In this case The following test target can be kept in the test:
Below is the detailed explanation for the new test:
No matter when the delayed thread occurs, the timeout in the test thread always hits. Just updated the PR to this version. The test passes 500 loops in my local env. And also passes after adding delay to test thread. |
What if the previous two threads haven’t allocated before this allocation? Would it still encounter a buffer exception? |
@chia7712
So it won't happen, the first three |
Sorry for the typo. My point is that deallocate is a no-op in this test, as the case will pass even if those threads haven't deallocated. If you prefer the solution of removing the threads, you should remove all deallocation threads |
You're right, removing the async deallocation doesn't change the test result.
I won't say the deallcoate is a no-op in this test, but the orignal test name is not accurate.
@chia7712 WDYT? |
I don't think this test case has actually tested this behavior, which is why I believe the deallocate is a no-op, as the related behavior isn't asserted in the test.
Perhaps this PR can simplify the test case to address the flakiness, and then you can file a follow-up to add more test cases. |
OK, I just update this PR to simplify the test case. |
@chenyulin0719 fyi, there was a build issue on trunk. I have fixed the issue and updated your branch with the fix. |
@mumrah Thank you for letting me know. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@chenyulin0719 Please file a jira as follow-up to add more test cases
Reviewers: Chia-Ping Tsai <[email protected]>
Reviewers: Chia-Ping Tsai <[email protected]>
It's regarding KAFKA-17661.
The test relies on 3 asynchronous threads being triggered in parallel with the test thread[1]. However, there is no guarantee of parallelism in test environment. The 25 ms delay(10 ms maxBlockTimeMs) is obviously unreliable in the test environment.
To address the racing between asynchronous threads, removing the third delayed deallocation to ensure
pool.allocate(10, maxBlockTimeMs)
always hits the timeout.[1]
kafka/clients/src/test/java/org/apache/kafka/clients/producer/internals/BufferPoolTest.java
Lines 175 to 179 in 4036081
Committer Checklist (excluded from commit message)