Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: Package testing.assert_kibana_available : localhost:5601/api/status #106749

Closed
jbudz opened this issue Jul 26, 2021 · 10 comments
Closed
Assignees
Labels
blocker failed-test A test failure on a tracked branch, potentially flaky-test Team:Operations Team label for Operations Team test-failure-flaky v7.15.0

Comments

@jbudz
Copy link
Member

jbudz commented Jul 26, 2021

04:22:44  TASK [assert_kibana_available : localhost:5601/api/status] *********************
04:23:16  fatal: [docker]: FAILED! => {"attempts": 1, "changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200, 401]: Connection failure: timed out", "redirected": false, "status": -1, "url": "http://localhost:5601/api/status"}

https://kibana-ci.elastic.co/job/elastic+kibana+package-testing-7.x/43
https://kibana-ci.elastic.co/job/elastic+kibana+package-testing/57

@jbudz jbudz added blocker Team:Operations Team label for Operations Team failed-test A test failure on a tracked branch, potentially flaky-test test-failure-flaky v7.15.0 labels Jul 26, 2021
@jbudz jbudz self-assigned this Jul 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

jbudz added a commit that referenced this issue Jul 26, 2021
jbudz added a commit that referenced this issue Jul 26, 2021
@jbudz
Copy link
Member Author

jbudz commented Jul 26, 2021

skipped
main: 08ef8da
7.x: a5e3de8

@cachedout
Copy link
Contributor

cachedout commented Jul 29, 2021

Hi @jbudz

I would like to bring some more attention to this issue because it seems quite similar to one that we're seeing.

This may be a problem which is actually affecting other teams in Elastic. For example, in Observability, we run APM Integration Tests which have been failing quite frequently as a result of the status API not being available after the Kibana container starts.

We even raised our timeout to wait for this API in Kibana to 5 minutes and we're still seeing the same timeouts.

If we can't reliably count on Kibana to give us a signal that it has started successfully, this is quite problematic for the orchestration that our entire integration testing platform depends on.

A few questions:

  1. Is @elastic/kibana-operations tracking any regressions right now which could explain our occasional inability to hit the /api/status endpoint within 5 minutes of a Kibana container starting? (We see no errors coming from Kibana.)
  2. Is hitting this endpoint the best way to determine that Kibana is fully started and ready for receive requests? If not, could @elastic/kibana-operations recommend an alternative approach?

We're definitely happy to continue providing any information that can help get this fixed, but we'd also like to see what could possibly be done about prioritizing a fix for this. Though it's certainly not a high-priority for end users (probably?) it's definitely having a negative impact on our ability to deliver reliable testing for our teams.

Thanks in advance!

cc: @elastic/observablt-robots

@kuisathaverat
Copy link
Contributor

This is the logs of the running @cachedout is talking about, there are no logs after 2021-07-29T03:24:35+00:00, we wait 5 min for kibana checking the /api/status, but Kibana is not available, after that we quit.

[2021-07-29T03:24:09.159Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","plugins-service"],"pid":1195,"message":"Plugin \"telemetry\" is disabled."}
[2021-07-29T03:24:09.160Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","plugins-service"],"pid":1195,"message":"Plugin \"telemetryManagementSection\" has been disabled since the following direct or transitive dependencies are missing, disabled, or have incompatible types: [telemetry]"}
[2021-07-29T03:24:09.161Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","plugins-service"],"pid":1195,"message":"Plugin \"userSetup\" is disabled."}
[2021-07-29T03:24:09.166Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","plugins-service"],"pid":1195,"message":"Plugin \"metricsEntities\" is disabled."}
[2021-07-29T03:24:09.255Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","http","server","Preboot"],"pid":1195,"message":"http server running at http://0.0.0.0:5601"}
[2021-07-29T03:24:09.312Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["warning","config","deprecation"],"pid":1195,"message":"plugins.scanDirs is deprecated and is no longer used"}
[2021-07-29T03:24:09.312Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["warning","config","deprecation"],"pid":1195,"message":"xpack.fleet.agents.kibana is deprecated and is no longer used"}
[2021-07-29T03:24:09.312Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["warning","config","deprecation"],"pid":1195,"message":"Config key [xpack.fleet.agents.elasticsearch.host] is deprecated and replaced by [xpack.fleet.agents.elasticsearch.hosts]"}
[2021-07-29T03:24:09.312Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["warning","config","deprecation"],"pid":1195,"message":"\"xpack.reporting.roles\" is deprecated. Granting reporting privilege through a \"reporting_user\" role will not be supported starting in 8.0. Please set \"xpack.reporting.roles.enabled\" to \"false\" and grant reporting privileges to users using Kibana application privileges **Management > Security > Roles**."}
[2021-07-29T03:24:09.527Z] {"type":"log","@timestamp":"2021-07-29T03:24:09+00:00","tags":["info","plugins-system","standard"],"pid":1195,"message":"Setting up [109] plugins: [translations,licensing,globalSearch,globalSearchProviders,banners,licenseApiGuard,usageCollection,xpackLegacy,taskManager,telemetryCollectionManager,telemetryCollectionXpack,kibanaUsageCollection,securityOss,share,screenshotMode,newsfeed,mapsEms,mapsLegacy,legacyExport,kibanaLegacy,embeddable,uiActionsEnhanced,expressions,charts,esUiShared,bfetch,data,savedObjects,visualizations,visTypeXy,visTypeVislib,visTypeTimelion,features,visTypeTagcloud,visTypeTable,visTypePie,visTypeMetric,visTypeMarkdown,tileMap,regionMap,presentationUtil,expressionShape,expressionRevealImage,expressionRepeatImage,expressionMetric,expressionImage,timelion,home,searchprofiler,painlessLab,grokdebugger,graph,visTypeVega,management,watcher,upgradeAssistant,licenseManagement,indexPatternManagement,advancedSettings,discover,discoverEnhanced,dashboard,dashboardEnhanced,visualize,visTypeTimeseries,savedObjectsManagement,spaces,security,transform,savedObjectsTagging,lens,reporting,canvas,lists,ingestPipelines,fileUpload,maps,dataVisualizer,encryptedSavedObjects,dataEnhanced,dashboardMode,cloud,snapshotRestore,fleet,indexManagement,rollup,remoteClusters,crossClusterReplication,indexLifecycleManagement,eventLog,actions,alerting,triggersActionsUi,stackAlerts,ruleRegistry,osquery,ml,cases,timelines,securitySolution,observability,uptime,infra,monitoring,logstash,enterpriseSearch,console,apmOss,apm]"}
[2021-07-29T03:24:10.958Z] {"type":"log","@timestamp":"2021-07-29T03:24:10+00:00","tags":["info","plugins","taskManager"],"pid":1195,"message":"TaskManager is identified by the Kibana UUID: aca82d38-9d4e-401a-a31c-5b51156eb88d"}
[2021-07-29T03:24:21.977Z] {"type":"log","@timestamp":"2021-07-29T03:24:21+00:00","tags":["warning","plugins","security","config"],"pid":1195,"message":"Session cookies will be transmitted over insecure connections. This is not recommended."}
[2021-07-29T03:24:22.028Z] {"type":"log","@timestamp":"2021-07-29T03:24:22+00:00","tags":["warning","plugins","reporting","config"],"pid":1195,"message":"Generating a random key for xpack.reporting.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.reporting.encryptionKey in the kibana.yml or use the bin/kibana-encryption-keys command."}
[2021-07-29T03:24:22.043Z] {"type":"log","@timestamp":"2021-07-29T03:24:22+00:00","tags":["warning","plugins","reporting","config"],"pid":1195,"message":"Chromium sandbox provides an additional layer of protection, but is not supported for Linux Red Hat Linux 8.4 OS. Automatically setting 'xpack.reporting.capture.browser.chromium.disableSandbox: true'."}
[2021-07-29T03:24:30.364Z] {"type":"log","@timestamp":"2021-07-29T03:24:30+00:00","tags":["info","plugins","ruleRegistry"],"pid":1195,"message":"Write is disabled, not installing assets"}
[2021-07-29T03:24:30.722Z] {"type":"log","@timestamp":"2021-07-29T03:24:30+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations..."}
[2021-07-29T03:24:31.237Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"Starting saved objects migrations"}
[2021-07-29T03:24:31.502Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana] INIT -> CREATE_NEW_TARGET. took: 82ms."}
[2021-07-29T03:24:31.515Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana_task_manager] INIT -> CREATE_NEW_TARGET. took: 91ms."}
[2021-07-29T03:24:31.757Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana_task_manager] CREATE_NEW_TARGET -> MARK_VERSION_INDEX_READY. took: 243ms."}
[2021-07-29T03:24:31.789Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana] CREATE_NEW_TARGET -> MARK_VERSION_INDEX_READY. took: 286ms."}
[2021-07-29T03:24:31.822Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana_task_manager] MARK_VERSION_INDEX_READY -> DONE. took: 65ms."}
[2021-07-29T03:24:31.823Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana_task_manager] Migration completed after 399ms"}
[2021-07-29T03:24:31.856Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana] MARK_VERSION_INDEX_READY -> DONE. took: 68ms."}
[2021-07-29T03:24:31.856Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","savedobjects-service"],"pid":1195,"message":"[.kibana] Migration completed after 436ms"}
[2021-07-29T03:24:31.978Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","status"],"pid":1195,"message":"Kibana is now unavailable"}
[2021-07-29T03:24:31.979Z] {"type":"log","@timestamp":"2021-07-29T03:24:31+00:00","tags":["info","plugins-system","standard"],"pid":1195,"message":"Starting [109] plugins: [translations,licensing,globalSearch,globalSearchProviders,banners,licenseApiGuard,usageCollection,xpackLegacy,taskManager,telemetryCollectionManager,telemetryCollectionXpack,kibanaUsageCollection,securityOss,share,screenshotMode,newsfeed,mapsEms,mapsLegacy,legacyExport,kibanaLegacy,embeddable,uiActionsEnhanced,expressions,charts,esUiShared,bfetch,data,savedObjects,visualizations,visTypeXy,visTypeVislib,visTypeTimelion,features,visTypeTagcloud,visTypeTable,visTypePie,visTypeMetric,visTypeMarkdown,tileMap,regionMap,presentationUtil,expressionShape,expressionRevealImage,expressionRepeatImage,expressionMetric,expressionImage,timelion,home,searchprofiler,painlessLab,grokdebugger,graph,visTypeVega,management,watcher,upgradeAssistant,licenseManagement,indexPatternManagement,advancedSettings,discover,discoverEnhanced,dashboard,dashboardEnhanced,visualize,visTypeTimeseries,savedObjectsManagement,spaces,security,transform,savedObjectsTagging,lens,reporting,canvas,lists,ingestPipelines,fileUpload,maps,dataVisualizer,encryptedSavedObjects,dataEnhanced,dashboardMode,cloud,snapshotRestore,fleet,indexManagement,rollup,remoteClusters,crossClusterReplication,indexLifecycleManagement,eventLog,actions,alerting,triggersActionsUi,stackAlerts,ruleRegistry,osquery,ml,cases,timelines,securitySolution,observability,uptime,infra,monitoring,logstash,enterpriseSearch,console,apmOss,apm]"}
[2021-07-29T03:24:32.040Z] {"type":"log","@timestamp":"2021-07-29T03:24:32+00:00","tags":["info","plugins","monitoring","monitoring"],"pid":1195,"message":"config sourced from: production cluster"}
[2021-07-29T03:24:34.266Z] {"type":"log","@timestamp":"2021-07-29T03:24:34+00:00","tags":["info","http","server","Kibana"],"pid":1195,"message":"http server running at http://0.0.0.0:5601"}
[2021-07-29T03:24:34.597Z] {"type":"log","@timestamp":"2021-07-29T03:24:34+00:00","tags":["info","plugins","monitoring","monitoring","kibana-monitoring"],"pid":1195,"message":"Starting monitoring stats collection"}
[2021-07-29T03:24:35.621Z] {"type":"log","@timestamp":"2021-07-29T03:24:35+00:00","tags":["info","plugins","reporting"],"pid":1195,"message":"Browser executable: /usr/share/kibana/x-pack/plugins/reporting/chromium/headless_shell-linux_x64/headless_shell"}
[2021-07-29T03:24:35.622Z] {"type":"log","@timestamp":"2021-07-29T03:24:35+00:00","tags":["warning","plugins","reporting"],"pid":1195,"message":"Enabling the Chromium sandbox provides an additional layer of protection."}
[2021-07-29T03:24:35.647Z] {"type":"log","@timestamp":"2021-07-29T03:24:35+00:00","tags":["info","plugins","reporting","store"],"pid":1195,"message":"Creating ILM policy for managing reporting indices: kibana-reporting"}
[2021-07-29T03:24:35.853Z] {"type":"log","@timestamp":"2021-07-29T03:24:35+00:00","tags":["info","plugins","securitySolution"],"pid":1195,"message":"Dependent plugin setup complete - Starting ManifestTask"}

@kseniia-kolpakova
Copy link

@jbudz @tylersmalley Could you please provide the status of this? Thank you

@tylersmalley
Copy link
Contributor

@cachedout, thanks for the information. I am not aware of this issue, but @elastic/kibana-core might be as they handle the HTTP server and the status endpoint. Our team does handle the functional test server, but it sounds like this issue is happening in multiple environments. For issues like this, or any issue you're facing for that matter, I would definitely recommend creating an issue to outline the problem you're facing so it can be addressed.

@cachedout
Copy link
Contributor

Thanks for the reply @tylersmalley . I wasn't sure if a new issue was better since it seemed like it would be a duplicate of this one. I'll go ahead and take what's here and put it into a new issue for the core folks to look at.

@tylersmalley
Copy link
Contributor

tylersmalley commented Jul 30, 2021

@cachedout, this issue is just to track that we have disabled the test in our package testing. We have yet to investigate a cause, which very well could be the issue you have described.

In regards to if hitting the status endpoint is the right approach to determine if Kibana is up. I know a common practice for load balancer configurations with Kibana is to hit the root / and check the status code. I wonder if that would be more reliable for you in the meantime, or if this is an issue with the HTTP server entirely.

@cachedout
Copy link
Contributor

Thanks @tylersmalley . I agree that filing an issue would have been a better approach. Apologies for not doing that originally.

I've filed the new issue here: #107300

(I'll also move your comment re: the / endpoint there so the discussion isn't split)

jbudz added a commit to jbudz/kibana that referenced this issue Aug 5, 2021
The socket timeout for testing whether the status page is available or
not is currently 30 seconds.  This test was disabled for being flaky.
Reproducing this locally hasn't been straight forward, but I am seeing
an average of ~20 seconds, which is close enough to the timeout that I'd
like to rule out machine differences.  This gives the status check 120
seconds before dropping the connection.

Related to elastic#106749 and elastic#107300
streamich pushed a commit to vadimkibana/kibana that referenced this issue Aug 8, 2021
jbudz added a commit that referenced this issue Aug 16, 2021
The socket timeout for testing whether the status page is available or
not is currently 30 seconds.  This test was disabled for being flaky.
Reproducing this locally hasn't been straight forward, but I am seeing
an average of ~20 seconds, which is close enough to the timeout that I'd
like to rule out machine differences.  This gives the status check 120
seconds before dropping the connection.

Related to #106749 and #107300

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine added a commit to kibanamachine/kibana that referenced this issue Aug 16, 2021
The socket timeout for testing whether the status page is available or
not is currently 30 seconds.  This test was disabled for being flaky.
Reproducing this locally hasn't been straight forward, but I am seeing
an average of ~20 seconds, which is close enough to the timeout that I'd
like to rule out machine differences.  This gives the status check 120
seconds before dropping the connection.

Related to elastic#106749 and elastic#107300

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine added a commit that referenced this issue Aug 16, 2021
The socket timeout for testing whether the status page is available or
not is currently 30 seconds.  This test was disabled for being flaky.
Reproducing this locally hasn't been straight forward, but I am seeing
an average of ~20 seconds, which is close enough to the timeout that I'd
like to rule out machine differences.  This gives the status check 120
seconds before dropping the connection.

Related to #106749 and #107300

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Jonathan Budzenski <[email protected]>
@jbudz
Copy link
Member Author

jbudz commented Aug 23, 2021

Resolved by #107813

@jbudz jbudz closed this as completed Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker failed-test A test failure on a tracked branch, potentially flaky-test Team:Operations Team label for Operations Team test-failure-flaky v7.15.0
Projects
None yet
Development

No branches or pull requests

6 participants