-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] XPackRestIT timed out after 30 mins #52428
Comments
Pinging @elastic/es-core-infra (:Core/Infra/Build) |
Many tests wait for pending tasks to complete during their cleanup phase so that tasks started by one test do not continue to run during the next test and cause hard-to-reproduce failures in that subsequent test. As you say, this means that if one test leaves a rogue task behind that never completes then it fails every subsequent test that waits for pending tasks and also makes the test suite take an eternity to complete. So maybe we should change the "wait for pending tasks" test cleanup functionality such that if the wait times out it instantly fails the entire test suite. It's just a question of how to do this in such a way that:
|
The problem seems to be the test that failed before, which is the problem described in #52429 The task leakage happens because after the failure, the test does not continue execution and therefore does not stop the task. Independent of #52429 it raises the question why execution continues after the failure. Because after the root cause, task cleanup does not happen, therefore all subsequent tests have the leaked task. I see 2 solutions:
Conceptually I like "fail fast", however there is a reason why 2 has been implemented at least for rollup. I like to learn about the reason. @astefan I am not understanding the links below the first failure. 7.6 should not be affected by this issue. If I follow the links the builds look fine to me. |
@hendrikmuhs you are right, those links are not needed. Will remove them from the initial description. |
created a PR for #52429, this should fix the failure. I keep this issue open to clarify the desired behavior when a test fails and whether it should stop immediately or not, see #52428 (comment) |
@hendrikmuhs I think we generally agree we should fail fast. IMO the problem is the cleanup is being done as normal test actions. Instead, could it be done as actual cleanup? Currently each yml file has a single setup/cleanup that is run before/after each test section. If the cleanup of these can't go into that cleanup section, maybe we need to consider a dedication teardown within each test section. |
@rjernst Yes, that's a good point and should be possible, I have to move the test into a separate file but this also in the interest of cleaner test separation. I will implement your suggestion. |
restructure transform yaml tests to run cleanup in teardown phase relates #52428
restructure transform yaml tests to run cleanup in teardown phase relates #52428
While XPackRestIT still suffers from timeouts on MacOS, this failure was broader and fixed a while ago. |
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_RUNTIME_JAVA=java11,nodes=general-purpose/516/console
https://gradle-enterprise.elastic.co/s/frzrlcgeo45j4
But this occurred in a scenario where one of tests it seems it left a task behind -
p0=transform/transforms_stats/Test get continuous transform stats
- and all the subsequent tests to spend 10 seconds inassertBusy
and, thus, to add up and go over the 30 minutes timeout. Still, opening this issue for a second pair of eyes over this and given the history with yaml tests timing out lately.The text was updated successfully, but these errors were encountered: