-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MonitoringIT testMonitoringService fails in CI #29880
Comments
When discussing this test, it made little sense that testMonitoringService would fail but not testMonitoringBulk given their similarity. So we argeed to enable it again. Relates #29880
We discussed this test and agreed to enable it again as it made little sense to use that this one would fail but not |
When discussing this test, it made little sense that testMonitoringService would fail but not testMonitoringBulk given their similarity. So we argeed to enable it again. Relates #29880
This consistently fails for me in 6.x:
|
I'm fixing this test and will push an update. |
We recently reenabled MonitoringIT to hunt down #29880 but some of its assertions were out of date. This updates the assertions.
We recently reenabled MonitoringIT to hunt down #29880 but some of its assertions were out of date. This updates the assertions.
I pushed f311680 . That doesn't fix the failure for which this test was originally created but it does get it passing again. Now we can get back to hunting the rare failure. |
New failure, same as the original one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java10,ES_RUNTIME_JAVA=java8,nodes=virtual&&linux/265/console . I wasn't able to reproduce it. |
Another failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/1852/console Unable to reproduce on Linux. (I don't think this is a Windows specific failure)
|
OK it failed again. What is really weird is that the index doesn't get created within 10 seconds, yet when we close the Node object after assertBussy times out, we see failed bulk requests in the log because the node is closing while they were processing. I am dumping all thread stacks to the log to see whether that points to something interesting. Here is the log for future reference:
|
If you see this failure again, please share the output and disable the test. |
It failed again here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=sles/5/console and here's the full output: consoleText.txt.gz I can't see anything terribly unusual in the jstack output, but I'm not sure what I'm looking for. However, I do note that the default monitoring interval is 10 seconds, and that |
As requested, I disabled the test in 7f6c130. |
This wasn't muted on the 6.x branch so failed again in The full log is preserved in consoleText.txt.gz and it contains the jstack output from the time of the failure. I muted the test for 6.x in 33e46ab |
This failed in 6.5 where it wasn't muted. |
This failed again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/1150/console The repro command was:
This didn't reproduce locally for me on a CentOS 7 server. As in the original issue description, the error is:
The log contains this jstack which may be of use:
Immediately after the failing test starts a Watcher error occurs:
I don't know if that's the cause of the failure or completely irrelevant. |
that ILM failure in the node should be unrelated. open issue for it here #38805 |
another instance: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/1537/console CI logs for above run: logs.txt |
Adding a build-stats link for tracking |
should we re-mute this now that we collected more data? looks like it is happening |
And another: I'll re-mute |
Relates to #29880 This test seems only to fail in master.
Relates to elastic#29880 This test seems only to fail in master.
Relates to elastic#29880 This test seems only to fail in master.
@jakelandis Looks like this test is still muted for some time now, should we just remove the test? |
The test failure issue for this test has been open for over two years, but this test has been muted so long that we don't have any actual failure information. This unmutes it so either it ceases to fail (yay), or else it fails and we have a gradle buildscan link that provides us with a bit more information. Relates to elastic#29880
The test failure issue for this test has been open for over two years, but this test has been muted so long that we don't have any actual failure information. This unmutes it so either it ceases to fail (yay), or else it fails and we have a gradle buildscan link that provides us with a bit more information. Relates to #29880
I opened a PR to unmute this. Since this issue is so old I don't feel like we should keep this issue around, rather, if it fails again after unmuting, we can re-open this issue (all the links to builds in this issue are expired, so there's not really much debugging we can do currently). |
The test failure issue for this test has been open for over two years, but this test has been muted so long that we don't have any actual failure information. This unmutes it so either it ceases to fail (yay), or else it fails and we have a gradle buildscan link that provides us with a bit more information. Relates to elastic#29880
Original comment by @jaymode:
The
MonitoringIT#testMonitoringService
fails (but not reproducibly) in CI due to not finding documents that the test expects:Failure links:
LINK REDACTED
LINK REDACTED
LINK REDACTED
LINK REDACTED
I am muting this test on master and 6.x. Assigning to @pickypg and @tlrx based on git history.
The text was updated successfully, but these errors were encountered: