Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

DaveCTurner · 2023-09-13T12:04:44Z

This test was failing in #34004 due to a race, and although #34296 made
the failures rarer they did not actually fix the race. Then in #99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.

This test was failing in elastic#34004 due to a race, and although elastic#34296 made the failures rarer they did not actually fix the race. Then in elastic#99201 we fixed the race but the resulting test over-synchronizes and no longer meaningfully verifies the concurrent behaviour we were originally trying to check. It also fails for other reasons. This commit reverts back to the original test showing that we might run the action at most once after cancellation without any further synchronization, but fixes the assertion to use the value of the counter observed immediately after the cancellation since we cannot be sure that no extra iterations execute before the cancellation completes.

elasticsearchmachine · 2023-09-13T12:05:08Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

thecoop · 2023-09-13T12:30:36Z

@DaveCTurner I'm not sure what this change is doing, nor what the issues are with the current code. Could you run through the problems this fixes please?

DaveCTurner · 2023-09-13T12:33:59Z

The thing that brought it to my attention was that this test is still failing, see https://gradle-enterprise.elastic.co/s/oxapeenawfjii.

But the (original) point of this test is that a cancellation on any old random thread permits at most one more execution before the cancellation kicks in. With the changes in #99201 we're asserting something very different about how this component behaves when running in lockstep with the cancelling thread, which is not the situation that really matters in practice.

thecoop · 2023-09-13T12:51:20Z

Ah, I hadn't spotted the barrier breaks after an await timeout.

The previous version of the test wasn't asserting a range of runs, it was asserting a very specific number of runs - so it was testing that calling cancel actually does stop the schedule in a timely manner, which the test is still doing.

DaveCTurner · 2023-09-13T13:06:10Z

so it was testing that calling cancel actually does stop the schedule in a timely manner, which the test is still doing.

Yes, but this test has so much synchronization that it's going to pass even if, say, you remove the volatile modifier on org.elasticsearch.threadpool.Scheduler.ReschedulingRunnable#run

thecoop

OK. And this covers the case it was previously failing with - that there could be one additional run after the cancel.

…ncellation

elasticsearchmachine · 2023-09-13T14:38:48Z

💔 Backport failed

Status	Branch	Result
❌	8.10	Commit could not be cherrypicked due to conflicts
❌	7.17	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 99525

This test was failing in #34004 due to a race, and although #34296 made the failures rarer they did not actually fix the race. Then in #99201 we fixed the race but the resulting test over-synchronizes and no longer meaningfully verifies the concurrent behaviour we were originally trying to check. It also fails for other reasons. This commit reverts back to the original test showing that we might run the action at most once after cancellation without any further synchronization, but fixes the assertion to use the value of the counter observed immediately after the cancellation since we cannot be sure that no extra iterations execute before the cancellation completes.

DaveCTurner added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label v8.11.0 v8.10.1 labels Sep 13, 2023

DaveCTurner requested a review from thecoop September 13, 2023 12:04

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Sep 13, 2023

DaveCTurner added the v7.17.14 label Sep 13, 2023

thecoop approved these changes Sep 13, 2023

View reviewed changes

DaveCTurner added the auto-backport-and-merge label Sep 13, 2023

Merge branch 'main' into 2023/09/13/testRunnableRunsAtMostOnceAfterCa…

605ede4

…ncellation

DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 13, 2023

elasticsearchmachine merged commit 5f9bb0c into elastic:main Sep 13, 2023

DaveCTurner deleted the 2023/09/13/testRunnableRunsAtMostOnceAfterCancellation branch September 13, 2023 14:38

elasticsearchmachine added the backport pending label Sep 13, 2023

DaveCTurner removed the backport pending label Sep 13, 2023

DaveCTurner restored the 2023/09/13/testRunnableRunsAtMostOnceAfterCancellation branch June 17, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

DaveCTurner commented Sep 13, 2023

elasticsearchmachine commented Sep 13, 2023

thecoop commented Sep 13, 2023 •

edited

Loading

DaveCTurner commented Sep 13, 2023

thecoop commented Sep 13, 2023

DaveCTurner commented Sep 13, 2023

thecoop left a comment

elasticsearchmachine commented Sep 13, 2023

Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

Conversation

DaveCTurner commented Sep 13, 2023

elasticsearchmachine commented Sep 13, 2023

thecoop commented Sep 13, 2023 • edited Loading

DaveCTurner commented Sep 13, 2023

thecoop commented Sep 13, 2023

DaveCTurner commented Sep 13, 2023

thecoop left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Sep 13, 2023

💔 Backport failed

thecoop commented Sep 13, 2023 •

edited

Loading