-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hitting /api/status
successfully takes a very long time after start
#107300
Comments
(Moving a comment by @tylersmalley here so the conversation isn't split)
We could do that but we need to ensure that Kibana is fully-functional which is why we chose the |
Wondering if: elastic/beats#27036 is also related to this issue. |
@cachedout I just tried this with the latest 7.14.0 BCs and I'm not able to reproduce this issue with the default configuration. For me, it took 17s for Kibana to go into an A few questions:
We made a few changes to this API recently, including fixing an issue related to slowness of getting to |
The socket timeout for testing whether the status page is available or not is currently 30 seconds. This test was disabled for being flaky. Reproducing this locally hasn't been straight forward, but I am seeing an average of ~20 seconds, which is close enough to the timeout that I'd like to rule out machine differences. This gives the status check 120 seconds before dropping the connection. Related to elastic#106749 and elastic#107300
Hi @joshdover Thanks again for this reply.
We're using nightly container snapshots of Kibana which are configured exclusively through environment variables. Here's what we're setting with a few values snipped:
Though we see this issue happen regularly in the CI it remains pretty elusive when I try to reproduce it locally. This could certainly very well be an environmental issue in the CI but I have yet to sort out exactly what that might be.
Will do.
We are using the 8.0.0 nightly snapshots so these fixes should be present.
In my local testing thus far today, I have yet to see a single instance where it is nearly this quick. What happens typically is that once the HTTP server becomes available, Ideally, I'd think, if the API were not available, Kibana would return an error-code right away instead of holding the connection to the client open. That said, it's still unclear to me whether or not these long waits are the actual root of the problem that we're experiencing. It's important to note that we are seeing this somewhat intermittently. I'll keep looking to see if I can try to find a pattern in the failures. That said, the Kibana configurations don't really vary so this seems like a race condition of some kind. I'll continue to update this issue as I learn more. |
The socket timeout for testing whether the status page is available or not is currently 30 seconds. This test was disabled for being flaky. Reproducing this locally hasn't been straight forward, but I am seeing an average of ~20 seconds, which is close enough to the timeout that I'd like to rule out machine differences. This gives the status check 120 seconds before dropping the connection. Related to #106749 and #107300 Co-authored-by: Kibana Machine <[email protected]>
The socket timeout for testing whether the status page is available or not is currently 30 seconds. This test was disabled for being flaky. Reproducing this locally hasn't been straight forward, but I am seeing an average of ~20 seconds, which is close enough to the timeout that I'd like to rule out machine differences. This gives the status check 120 seconds before dropping the connection. Related to elastic#106749 and elastic#107300 Co-authored-by: Kibana Machine <[email protected]>
The socket timeout for testing whether the status page is available or not is currently 30 seconds. This test was disabled for being flaky. Reproducing this locally hasn't been straight forward, but I am seeing an average of ~20 seconds, which is close enough to the timeout that I'd like to rule out machine differences. This gives the status check 120 seconds before dropping the connection. Related to #106749 and #107300 Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Jonathan Budzenski <[email protected]>
Update on this one. We're waiting for new nightly snapshots to be produced which include the changes from this PR and hoping that those resolve this issue. |
Thank you @cachedout! Let us know if we need to put more research into this issue. |
@afharo Sadly, we're still seeing this problem. We're waiting a full five minutes for Kibana to return a response at this point. It's really starting to seem less as if the |
@tylersmalley and @afharo I now believe this probably is almost certainly a symptom of what's being tracked in #110583 How would you like to proceed in terms of tracking this issue? Would you prefer to keep both issues open or would you prefer to close this in favor of #110583 which appears as if it's going to be prioritized. 🤞 |
cc @elastic/kibana-core since I'm on leave. |
#110583 is prioritized, and currently in our 'next spring queue', so if we know that this issue is a duplicate or a symptom of #110583, I think it's best to close this one and track #110583 instead. |
Kibana version:
Nightly snapshots
Elasticsearch version:
Nightly snapshots
Server OS version:
Linux
Browser version:
N/A
Browser OS version:
N/A
Original install method (e.g. download page, yum, from source, etc.):
Nightly snapshot
Describe the bug:
I am moving a conversation from #106749 to this issue at the request of @tylersmalley and re-tagging with @elastic/kibana-core instead of @elastic/kibana-operations .
Summary
In Observability, we run APM Integration Tests which have been failing quite frequently as a result of the status API not being available after the Kibana container starts.
We even raised our timeout to wait for this API in Kibana to 5 minutes and we're still seeing the same timeouts.
Of course, if we can't reliably count on Kibana to give us a signal that it has started successfully, this is quite problematic for the orchestration that our entire integration testing platform depends on.
A few questions:
We're definitely happy to continue providing any information that can help get this fixed, but we'd also like to see what could possibly be done about prioritizing a fix for this. Though it's certainly not a high-priority for end users (probably?) it's definitely having a negative impact on our ability to deliver reliable testing for our teams.
Logs
There are no logs after 2021-07-29T03:24:35+00:00. We wait 5 min for Kibana to check
/api/status
, but there is no response. After the five minute timeout, we bail out.Steps to reproduce:
/api/status
endpoint.Expected behavior:
We expect to be able either hit the endpoint within 5 minutes or see an error in the logs indicating the reason why not.
The text was updated successfully, but these errors were encountered: