-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525
Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525
Conversation
This test was failing in elastic#34004 due to a race, and although elastic#34296 made the failures rarer they did not actually fix the race. Then in elastic#99201 we fixed the race but the resulting test over-synchronizes and no longer meaningfully verifies the concurrent behaviour we were originally trying to check. It also fails for other reasons. This commit reverts back to the original test showing that we might run the action at most once after cancellation without any further synchronization, but fixes the assertion to use the value of the counter observed immediately after the cancellation since we cannot be sure that no extra iterations execute before the cancellation completes.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
@DaveCTurner I'm not sure what this change is doing, nor what the issues are with the current code. Could you run through the problems this fixes please? |
The thing that brought it to my attention was that this test is still failing, see https://gradle-enterprise.elastic.co/s/oxapeenawfjii. But the (original) point of this test is that a cancellation on any old random thread permits at most one more execution before the cancellation kicks in. With the changes in #99201 we're asserting something very different about how this component behaves when running in lockstep with the cancelling thread, which is not the situation that really matters in practice. |
Ah, I hadn't spotted the barrier breaks after an await timeout. The previous version of the test wasn't asserting a range of runs, it was asserting a very specific number of runs - so it was testing that calling |
Yes, but this test has so much synchronization that it's going to pass even if, say, you remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. And this covers the case it was previously failing with - that there could be one additional run after the cancel.
💔 Backport failed
You can use sqren/backport to manually backport by running |
This test was failing in #34004 due to a race, and although #34296 made the failures rarer they did not actually fix the race. Then in #99201 we fixed the race but the resulting test over-synchronizes and no longer meaningfully verifies the concurrent behaviour we were originally trying to check. It also fails for other reasons. This commit reverts back to the original test showing that we might run the action at most once after cancellation without any further synchronization, but fixes the assertion to use the value of the counter observed immediately after the cancellation since we cannot be sure that no extra iterations execute before the cancellation completes.
This test was failing in #34004 due to a race, and although #34296 made the failures rarer they did not actually fix the race. Then in #99201 we fixed the race but the resulting test over-synchronizes and no longer meaningfully verifies the concurrent behaviour we were originally trying to check. It also fails for other reasons. This commit reverts back to the original test showing that we might run the action at most once after cancellation without any further synchronization, but fixes the assertion to use the value of the counter observed immediately after the cancellation since we cannot be sure that no extra iterations execute before the cancellation completes.
This test was failing in #34004 due to a race, and although #34296 made
the failures rarer they did not actually fix the race. Then in #99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.