Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstate testRunnableRunsAtMostOnceAfterCancellation #99525

Conversation

DaveCTurner
Copy link
Contributor

This test was failing in #34004 due to a race, and although #34296 made
the failures rarer they did not actually fix the race. Then in #99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.

This test was failing in elastic#34004 due to a race, and although elastic#34296 made
the failures rarer they did not actually fix the race. Then in elastic#99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label v8.11.0 v8.10.1 labels Sep 13, 2023
@DaveCTurner DaveCTurner requested a review from thecoop September 13, 2023 12:04
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Sep 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@thecoop
Copy link
Member

thecoop commented Sep 13, 2023

@DaveCTurner I'm not sure what this change is doing, nor what the issues are with the current code. Could you run through the problems this fixes please?

@DaveCTurner
Copy link
Contributor Author

The thing that brought it to my attention was that this test is still failing, see https://gradle-enterprise.elastic.co/s/oxapeenawfjii.

But the (original) point of this test is that a cancellation on any old random thread permits at most one more execution before the cancellation kicks in. With the changes in #99201 we're asserting something very different about how this component behaves when running in lockstep with the cancelling thread, which is not the situation that really matters in practice.

@thecoop
Copy link
Member

thecoop commented Sep 13, 2023

Ah, I hadn't spotted the barrier breaks after an await timeout.

The previous version of the test wasn't asserting a range of runs, it was asserting a very specific number of runs - so it was testing that calling cancel actually does stop the schedule in a timely manner, which the test is still doing.

@DaveCTurner
Copy link
Contributor Author

so it was testing that calling cancel actually does stop the schedule in a timely manner, which the test is still doing.

Yes, but this test has so much synchronization that it's going to pass even if, say, you remove the volatile modifier on org.elasticsearch.threadpool.Scheduler.ReschedulingRunnable#run

Copy link
Member

@thecoop thecoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. And this covers the case it was previously failing with - that there could be one additional run after the cancel.

@DaveCTurner DaveCTurner added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 13, 2023
@elasticsearchmachine elasticsearchmachine merged commit 5f9bb0c into elastic:main Sep 13, 2023
@DaveCTurner DaveCTurner deleted the 2023/09/13/testRunnableRunsAtMostOnceAfterCancellation branch September 13, 2023 14:38
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.10 Commit could not be cherrypicked due to conflicts
7.17 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 99525

DaveCTurner added a commit that referenced this pull request Sep 13, 2023
This test was failing in #34004 due to a race, and although #34296 made
the failures rarer they did not actually fix the race. Then in #99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.
DaveCTurner added a commit that referenced this pull request Sep 13, 2023
This test was failing in #34004 due to a race, and although #34296 made
the failures rarer they did not actually fix the race. Then in #99201 we
fixed the race but the resulting test over-synchronizes and no longer
meaningfully verifies the concurrent behaviour we were originally trying
to check. It also fails for other reasons. This commit reverts back to
the original test showing that we might run the action at most once
after cancellation without any further synchronization, but fixes the
assertion to use the value of the counter observed immediately after the
cancellation since we cannot be sure that no extra iterations execute
before the cancellation completes.
@DaveCTurner DaveCTurner restored the 2023/09/13/testRunnableRunsAtMostOnceAfterCancellation branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v7.17.14 v8.10.1 v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants