Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Server LibrariesTest & co. tests fail due to timeouts in CI #8806

Closed
mwu-tow opened this issue Jan 18, 2024 · 12 comments · Fixed by #8936 or #8967
Closed

Language Server LibrariesTest & co. tests fail due to timeouts in CI #8806

mwu-tow opened this issue Jan 18, 2024 · 12 comments · Fixed by #8936 or #8967
Assignees
Labels

Comments

@mwu-tow
Copy link
Contributor

mwu-tow commented Jan 18, 2024

The Engine tests fail from time to time due to timeouts.

Example:

- should create a library project and include it on the list of local projects *** FAILED *** (5 seconds, 503 milliseconds)
  java.lang.AssertionError: assertion failed: timeout (5 seconds) during expectMsgClass waiting for class java.lang.String
  at scala.Predef$.assert(Predef.scala:279)
  at akka.testkit.TestKitBase.expectMsgClass_internal(TestKit.scala:571)
  at akka.testkit.TestKitBase.expectMsgClass(TestKit.scala:567)
  at akka.testkit.TestKitBase.expectMsgClass$(TestKit.scala:567)
  at akka.testkit.TestKit.expectMsgClass(TestKit.scala:973)
  at org.enso.jsonrpc.test.JsonRpcServerTestKit$WsTestClient.expectMessage(JsonRpcServerTestKit.scala:120)
  at org.enso.jsonrpc.test.JsonRpcServerTestKit$WsTestClient.expectSomeJson(JsonRpcServerTestKit.scala:158)
  at org.enso.languageserver.websocket.json.LibrariesTest.$anonfun$new$2(LibrariesTest.scala:137)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  ...

Eventually followed by:

Tests: succeeded 300, failed 1, canceled 0, ignored 4, pending 6
*** 1 TEST FAILED ***
 Failed tests:
 	org.enso.languageserver.websocket.json.LibrariesTest

(full log archive, in case they are no longer accessible through GH Actions)

The tests should not have timeouts that can so easily be hit if the machine is under load or the tests are run on a slow machine.

We had previously increased the timeouts, however, the issues still occur.

#8785 added printing thread dumps when a timeout failure occurs, which may potentially help with debugging. (The dumps are present in the linked example.)

See also the last Discord discussion on the subject.

@radeusgd
Copy link
Member

@JaroslavTulach commented that there are lots of ZIO threads running during the tests and this may be one of the causes why the tests are 'jamming'

radeusgd added a commit that referenced this issue Jan 19, 2024
This is just a band-aid fix, we probably need to do something about the constant failures and heavy runner load, see: #8806
@mwu-tow
Copy link
Contributor Author

mwu-tow commented Jan 25, 2024

Another case, though on macOS: https://github.com/enso-org/enso/actions/runs/7639249532/job/20811863330?pr=8844

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@JaroslavTulach
Copy link
Member

LibrariesTest failed - the last thread dump seems to contain 15 searcher.db-1 threads. That's 14 more than I'd expect.

The LibrariesTest.log.gz file contains the last thread dump extracted from the raw output of the CI run.

@JaroslavTulach JaroslavTulach changed the title Engine Tests failing sometimes due to timeouts Language Server CI tests fail due to timeouts Jan 27, 2024
@JaroslavTulach JaroslavTulach changed the title Language Server CI tests fail due to timeouts Language Server LibrariesTest & co. tests fail due to timeouts in CI Jan 27, 2024
@hubertp
Copy link
Contributor

hubertp commented Jan 28, 2024

I have a pending fix for fixing Actor System threads but ZIO stuff is proving difficult. I also noticed that the way we currently configure Runtime, seems to ignore the assigned executor despite the explicit setting. Instead, zio stuff appears to default to a regular ZScheduler which uses all cores.

@JaroslavTulach
Copy link
Member

... the way we currently configure Runtime, seems to ignore the assigned executor...

That's a good finding! It promises things may get better.

@enso-bot
Copy link

enso-bot bot commented Jan 29, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-28):

Progress: Figured how to reduce some of our resources for tests, but then I'm getting timeouts in other areas. Needs more work. It should be finished by 2024-01-30.

Next Day: Next day I will be working on the #8806 task. Continue investigating #8806

@hubertp hubertp self-assigned this Jan 29, 2024
@hubertp hubertp moved this from ❓New to 🔧 Implementation in Issues Board Jan 29, 2024
@enso-bot
Copy link

enso-bot bot commented Jan 30, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-29):

Progress: Still fighting CI to make #8801 pass, removing a circular dependency between subprojects finally did the work. Added some improvements to #8806 to make Akka use less resources. Still fighting with zio. It should be finished by 2024-01-30.

Next Day: Next day I will be working on the #8806 task. Continue investigating #8806

@hubertp hubertp moved this from 🔧 Implementation to 👁️ Code review in Issues Board Jan 31, 2024
mergify bot pushed a commit that referenced this issue Jan 31, 2024
The defaults picked up by Akka tend to make us of all resources which is unnecessary and overwhelming for tests.

Improves #8806, potentially.

Before
![Screenshot from 2024-01-28 22-34-42](https://github.com/enso-org/enso/assets/292128/f80eb66a-2f37-44d5-bcdb-f00a78fe72fd)
After
![Screenshot from 2024-01-31 00-12-10](https://github.com/enso-org/enso/assets/292128/c5223912-5f6e-413c-a0a4-050afa3ed463)

when running the problematic `LibrariesTest`.

Full `language-server` test suite.
Before
![Screenshot from 2024-01-31 00-20-50](https://github.com/enso-org/enso/assets/292128/f1c94a66-6905-4f57-8a7d-7df049714353)
After
![Screenshot from 2024-01-31 00-18-40](https://github.com/enso-org/enso/assets/292128/3a11125e-d593-43df-8d35-1a8915812b2b)

# Important Notes
Note that Executors assigned to Zio and initializers should also be improved. Unfortunately due to various blocking threadpools  it is easy to get timeouts when running the whole suite.
@mwu-tow
Copy link
Contributor Author

mwu-tow commented Jan 31, 2024

@hubertp
It seems that #8892 is not enough yet, as its merge-commit failed due to this very issue: https://github.com/enso-org/enso/actions/runs/7724599547/job/21057063189#step:10:4031

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@enso-bot
Copy link

enso-bot bot commented Jan 31, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-01-30):

Progress: Managed to find a number of unclosed threadpools that led to memory leaks. Zio is still on the list, will file a separate ticket. It should be finished by 2024-01-30.

Next Day: Next day I will be working on the #8897 task. Pick up next item on the list.

@hubertp
Copy link
Contributor

hubertp commented Jan 31, 2024

Created a follow up ticket. I haven't seen the timeout since this change.

@hubertp hubertp closed this as completed Jan 31, 2024
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Jan 31, 2024
@Akirathan
Copy link
Member

Akirathan commented Feb 1, 2024

Just bumped into another instance of LibrariesTests failure due to timeout in https://github.com/enso-org/enso/actions/runs/7738790398/job/21100330354?pr=8918#step:10:4033

GitHub
Hybrid visual and textual functional programming. Contribute to enso-org/enso development by creating an account on GitHub.

@mwu-tow
Copy link
Contributor Author

mwu-tow commented Feb 1, 2024

Created a follow up ticket. I haven't seen the timeout since this change.

I've linked one two messages above.

@enso-bot enso-bot bot mentioned this issue Feb 2, 2024
mergify bot pushed a commit that referenced this issue Feb 2, 2024
TestRuntime should be deprecated as it creates a number of threads and doesn't allow to easily modify ZIO's runtime.
But the biggest drop stems from fixing leaking `FileSystemService` that weren't being closed for every `TextOperationsTest` test.
The change is a follow up on #8892 but this time focused on ZIO usage.

Hopefully fixes #8806 for good.

# Important Notes
Running `language-server/test`.
Before:
![Screenshot from 2024-02-02 09-48-32](https://github.com/enso-org/enso/assets/292128/fb414c74-7d7a-4e7b-8b0c-d25dc3721bbf)

After:
![Screenshot from 2024-02-02 09-46-02](https://github.com/enso-org/enso/assets/292128/db9429df-d861-4f48-818f-888d5bbbb089)
hubertp added a commit that referenced this issue Feb 5, 2024
Despite all attempts to reduce resource usage, the test continues to be
stubborn like a mule and randomly timeouts on CI. Adding an option to print
stacktraces and maybe someone will be struck by lighting and be able to
figure it out. Adding the stracktrace in all cases pollutes the output
from CI.

Closes  #8806.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants