-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Intermittent java.lang.Exception: Suite timeout exceeded (>= 1200000 msec). #1843
Comments
Tracked here #1715 as well, related to |
Ran test in batch of 100s but did not get SUITE Timeout error; though got 3 different failures. First failure is Assertion failure on cluster having green state while last two are related with #1767 Command Used
Commit: 1bb8864 Gradle failure output
|
@dreamer-89 this one is tracked under #1767 |
Ran tests in repeat mode with 500 count and was able to get
Test thread state from jstack output also depicts that test thread is in
Gradle output: |
Yeah, |
Added logs to get more insights when the timeout occurs. It seems the test is timing out because bulk request is not caught (bug?) as sharding indexing pressure breached (last successful duration limit) here by ShardIndexingMemoryManager. This bypass results in thread waiting for CountDonwLatch.await() call and thus timeout.
|
The code is expected to fail the below check but it doesn't because of below condition including
To get why this check is failing; added another log statement to find out why this condition is not failing and raising |
As expected,
|
There are couple of ways to fix this:
|
The failures were not observed post adding wait statement for outstanding requests to complete. |
Summary
What test doesTo exercise different shard indexing pressure parameters, the test performs bulk requests in succession on the coordinator node. To have requests outstanding, a mock CountDownLatch behaviour is added on primary node transport; and thus builds the outstanding request count. Broad set of steps in the test
DebugThe issue was debugged with the help of log statements. Each time, logs were added one layer deep to get understanding why test is failing. The test were run multiple times in batch of 100/500 in order to repro the test timeout issue.
Thanks @andrross for help throughout the task and @getsaurabh02 for providing shard indexing insights. Learnings
CommitCommit fixes the issue. |
Describe the bug
Intermittent test failures due to java.lang.Exception: Suite timeout exceeded (>= 1200000 msec). Sample jstack during timeout
To Reproduce
Issue is not straight forward to reproduce. Need to run the gradle check in a loop multiple times.
Steps to reproduce the behavior:
Expected behavior
No test suite timeouts.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: