-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Server LibrariesTest
& co. tests fail due to timeouts in CI
#8806
Comments
@JaroslavTulach commented that there are lots of ZIO threads running during the tests and this may be one of the causes why the tests are 'jamming' |
This is just a band-aid fix, we probably need to do something about the constant failures and heavy runner load, see: #8806
Another case, though on macOS: https://github.com/enso-org/enso/actions/runs/7639249532/job/20811863330?pr=8844
|
LibrariesTest failed - the last thread dump seems to contain 15 searcher.db-1 threads. That's 14 more than I'd expect. The LibrariesTest.log.gz file contains the last thread dump extracted from the raw output of the CI run. |
LibrariesTest
& co. tests fail due to timeouts in CI
I have a pending fix for fixing Actor System threads but ZIO stuff is proving difficult. I also noticed that the way we currently configure Runtime, seems to ignore the assigned executor despite the explicit setting. Instead, zio stuff appears to default to a regular ZScheduler which uses all cores. |
That's a good finding! It promises things may get better. |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-28): Progress: Figured how to reduce some of our resources for tests, but then I'm getting timeouts in other areas. Needs more work. It should be finished by 2024-01-30. Next Day: Next day I will be working on the #8806 task. Continue investigating #8806 |
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-29): Progress: Still fighting CI to make #8801 pass, removing a circular dependency between subprojects finally did the work. Added some improvements to #8806 to make Akka use less resources. Still fighting with zio. It should be finished by 2024-01-30. Next Day: Next day I will be working on the #8806 task. Continue investigating #8806 |
The defaults picked up by Akka tend to make us of all resources which is unnecessary and overwhelming for tests. Improves #8806, potentially. Before ![Screenshot from 2024-01-28 22-34-42](https://github.com/enso-org/enso/assets/292128/f80eb66a-2f37-44d5-bcdb-f00a78fe72fd) After ![Screenshot from 2024-01-31 00-12-10](https://github.com/enso-org/enso/assets/292128/c5223912-5f6e-413c-a0a4-050afa3ed463) when running the problematic `LibrariesTest`. Full `language-server` test suite. Before ![Screenshot from 2024-01-31 00-20-50](https://github.com/enso-org/enso/assets/292128/f1c94a66-6905-4f57-8a7d-7df049714353) After ![Screenshot from 2024-01-31 00-18-40](https://github.com/enso-org/enso/assets/292128/3a11125e-d593-43df-8d35-1a8915812b2b) # Important Notes Note that Executors assigned to Zio and initializers should also be improved. Unfortunately due to various blocking threadpools it is easy to get timeouts when running the whole suite.
@hubertp
|
Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-30): Progress: Managed to find a number of unclosed threadpools that led to memory leaks. Zio is still on the list, will file a separate ticket. It should be finished by 2024-01-30. Next Day: Next day I will be working on the #8897 task. Pick up next item on the list. |
Created a follow up ticket. I haven't seen the timeout since this change. |
Just bumped into another instance of LibrariesTests failure due to timeout in https://github.com/enso-org/enso/actions/runs/7738790398/job/21100330354?pr=8918#step:10:4033
|
I've linked one two messages above. |
TestRuntime should be deprecated as it creates a number of threads and doesn't allow to easily modify ZIO's runtime. But the biggest drop stems from fixing leaking `FileSystemService` that weren't being closed for every `TextOperationsTest` test. The change is a follow up on #8892 but this time focused on ZIO usage. Hopefully fixes #8806 for good. # Important Notes Running `language-server/test`. Before: ![Screenshot from 2024-02-02 09-48-32](https://github.com/enso-org/enso/assets/292128/fb414c74-7d7a-4e7b-8b0c-d25dc3721bbf) After: ![Screenshot from 2024-02-02 09-46-02](https://github.com/enso-org/enso/assets/292128/db9429df-d861-4f48-818f-888d5bbbb089)
Despite all attempts to reduce resource usage, the test continues to be stubborn like a mule and randomly timeouts on CI. Adding an option to print stacktraces and maybe someone will be struck by lighting and be able to figure it out. Adding the stracktrace in all cases pollutes the output from CI. Closes #8806.
The Engine tests fail from time to time due to timeouts.
Example:
Eventually followed by:
(full log archive, in case they are no longer accessible through GH Actions)
The tests should not have timeouts that can so easily be hit if the machine is under load or the tests are run on a slow machine.
We had previously increased the timeouts, however, the issues still occur.
#8785 added printing thread dumps when a timeout failure occurs, which may potentially help with debugging. (The dumps are present in the linked example.)
See also the last Discord discussion on the subject.
The text was updated successfully, but these errors were encountered: