-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Add workflow for requesting and downloading agent diagnostics from Fleet UI #141074
Comments
Pinging @elastic/fleet (Team:Fleet) |
I've added this requirement based on an internal conversation w/ support where this was identified as a big win. @juliaElastic or @michel-laterman do you foresee any issues with "queuing up" diagnostics requests for multiple agents via |
I don't see an issue with this. |
Even better. Thanks for clarifying! This should make the process of requesting diagnostics from many agents simpler. |
Yes, this can work the same way as other bulk actions. |
@kpollich I think the bulk action of Request Diagnostics is more complex than we assumed: For bulk selection, it doesn't sound logical to navigate to the There is also a question on the size of files the bulk action would produce, are there any concerns of uploading hundreds of Mb per each agent? This could quickly reach a very big file size if actioned on large agent selections. |
I agree with this. We'll probably want to do something else after a bulk "request diagnostics" action is created.
Could we open the "agent activity" flyout after this bulk action is created to show the pending action w/ some detail about its status? Maybe we can display an expandable section where each agent is listed w/ a link to its
Yes there'd be one zip per agent.
This is true, and users should be aware that diagnostics incur storage costs + storage costs incur monetary costs on cloud. We may want to document this in a callout on the diagnostics tab. Something like |
Good idea, though showing a link for each agent would only work for a limited number of agents.
Yes, we can do that. Even a confirmation window could be added with a warning message. |
Good points here. For bulk actions, let's just opt not to open the activity flyout once the action is created then. Eventually, we'll enhance the flyout with some more granular info about each agent for which an action was created. For now though, I think just showing the status of the "request bulk diagnostics" operation is good enough. The user will likely dismiss the flyout and visit the agents individually to access diagnostics afterwards. |
I'm realizing we don't have expiry captured anywhere here. We'll want to create a new index for the uploaded files to be stored in, and create an ILM policy during Fleet setup to manage it. One thing I'm not clear on is whether we need to add to https://github.com/elastic/elasticsearch/tree/main/x-pack/plugin/core/src/main/resources (see |
@kpollich We discussed on the call today that there will be different indices needed for fleet and endpoint security (and potentially other integrations in the future). |
@kpollich @joshdover @paul-tavares I tested the Kibana File Service to query files from these indices, and got this error: Is this a known limitation of Kibana File Service? Are we expected to use kibana prefix that impacts the privileges? When I tried to put
Here is the code that I used:
EDIT: the kibana blocker is resolved now, managed to get the download working, see in pr description: #142369 |
@kpollich Do we mean to show the toast message when the user is on the |
I would expect the toast to function in the same "async global" way that package installation toasts work. I think this is implemented using the Kibana notifications service? |
Yes, I am using kibana notifications service. The question referred more on whether it is okay to do the polling when the user is on Diagnostics tab, or do we want the polling in the background also when they navigate away? |
Got it - thank you for clarifying. Let's just keep the polling on the diagnostics tab for now. |
Blocked by #143459 |
## Summary Closes #141074 ### Request diagnostics action Added new action for single agent (Agent details page and Agent list row actions) to request diagnostics. When clicking on the action, an API request is made that creates a `REQUEST_DIAGNOSTICS` type action in `.fleet-actions` index. ### Diagnostics uploads display When the action is submitted, the user is navigated to the new `Agent Details / Diagnostics` tab, which shows the list of pending and completed diagnostics file uploads. The information is coming from the `/action_status` (for action status) as well as the `/uploads` endpoint (for file name and path) By clicking on a diagnostics link, the file should be downloaded in zip. <img width="1060" alt="image" src="https://user-images.githubusercontent.com/90178898/193816708-803c2a22-d421-4af2-9a78-785cdee81136.png"> Failed uploads display: <img width="638" alt="image" src="https://user-images.githubusercontent.com/90178898/194058366-d4874339-9fd1-419e-99e5-f592a6b3bf6d.png"> Expired status was not specified in the design separately, it will be shown like the failed status (with warning icon). ### Mock data (blocker) Currently returning mock data in the `/uploads` API, because of a blocker in Kibana File Service, see [here](#141074 (comment)). ### Bulk action Added bulk action too: <img width="1759" alt="image" src="https://user-images.githubusercontent.com/90178898/194026861-bf0d5956-de2d-4d2b-895a-c35cf5252a5a.png"> Shows up in agent activity: <img width="594" alt="image" src="https://user-images.githubusercontent.com/90178898/194026960-356a5b40-1203-4182-ad7b-89b1432bf0f6.png"> The Fleet Server / Agent changes are not there yet, though FS delivers the action, and Agents ack it (looks like default behavior for unkown actions as well) ### Confirmation modal Added a confirmation modal when clicking on action button everywhere, except for the `Request diagnostics` button on the Diagnostics page. Open question: - Do we want to display the confirmation window on the Diagnostics page button too? <img width="673" alt="image" src="https://user-images.githubusercontent.com/90178898/194065175-715b158e-0628-4bd9-86db-920c1ec9825e.png"> ### Download Generated file path to download in this format: `/api/fleet/agents/files/{fileId}/{fileName}` Decided not to try to use `files` plugin's API because it doesn't have the Fleet authorization around it. Screen recording demonstrating the download of an agent diagnostics zip file, that I uploaded using the Fleet Server upload API (using [Dan's pr](elastic/fleet-server#1902) locally) https://user-images.githubusercontent.com/90178898/194287842-c7f09c9e-5310-460f-9cae-6fc7fa7750de.mov ### Notification Added toast message to show up when a diagnostics becomes ready, when we are on the Diagnostics tab. https://user-images.githubusercontent.com/90178898/194318170-e7ec66db-8bf8-4535-b07e-682397c2920c.mov ### Checklist Delete any items that are not applicable to this PR. - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <[email protected]>
Reopening as there is a pending change to enable the feature flag. |
## Summary Follow up for #141074 Added Request Diagnostics to OpenAPI spec
Moved this back to blocked, as waiting for the dependent changes to be merged before the feature flag can be set to enabled in 8.7. |
Is this still blocked? I see #143459 is completed |
@amitkanfer - Yes this is still blocked by elastic/fleet-server#1902 and elastic/elastic-agent#1703 |
… to use upload_id (#149575) ## Summary Closes #141074 Enabled feature flag and tweaked implementation to find file by `upload_id` rather than doc id. How to test: - Start local kibana, start Fleet Server, enroll Elastic Agent from local (pull [these changes](elastic/elastic-agent#1703) ) - Click on Request Diagnostics action on the Agent - The diagnostics file should appear on Agent Details / Diagnostics tab. - The action should be completed on Agent activity <img width="1585" alt="image" src="https://user-images.githubusercontent.com/90178898/214805187-2b1abe34-ba7e-4612-9fad-7ef1f5942f47.png"> <img width="745" alt="image" src="https://user-images.githubusercontent.com/90178898/214805997-20fdaa01-e4c5-461c-b395-1b1e43117f8a.png"> The file metadata and binary can be queried from these indices: ``` GET .fleet-files-agent/_search GET .fleet-file-data-agent/_search ``` Tweaked the implementation so that the pending actions are showing up as soon as the `.fleet-actions` record is created (it can take several minutes until the action result is ready) Plus added a tooltip for error status <img width="948" alt="image" src="https://user-images.githubusercontent.com/90178898/214841337-eacbb1fc-4934-4d8b-9d52-8db4502d2493.png"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Changed query of diagnostics files to speed up seeing the files. This is because the agent has a delay of about 4m to ack the action, this has to be fixed separately, see here elastic/elastic-agent#1703 (comment) Related to #141074 We can search for the diagnostics file by `agent_id` and `action_id`, so don't have to wait for the `upload_id` which comes from `.fleet-actions-results`. https://user-images.githubusercontent.com/90178898/215451881-bfaa9e86-e055-4490-87b1-dc1d1076a738.mov Displaying error from agent when diagnostics failed: <img width="839" alt="image" src="https://user-images.githubusercontent.com/90178898/215476207-5db7e935-28dd-432e-a6a6-195da162028a.png"> E.g. `.fleet-files-agent` ``` { "_index": ".fleet-files-agent-000001", "_id": "8a004559-0731-4b8f-b29e-d7405ca0d68c.3a1f21b3-4559-4d3f-aae0-58356c269a92", "_score": null, "_source": { "action_id": "8a004559-0731-4b8f-b29e-d7405ca0d68c", "agent_id": "3a1f21b3-4559-4d3f-aae0-58356c269a92", "contents": null, "file": { "ChunkSize": 4194304, "Status": "READY", "ext": "zip", "hash": { "md5": "", "sha256": "" }, "mime_type": "application/zip", "name": "elastic-agent-diagnostics-2023-01-30T10-13-33Z-00.zip", "size": 577178 }, "src": "agent", "upload_id": "988da8ad-9d92-4d18-b5b0-b2a7e77f5a81", "upload_start": 1675073615066, "transithash": { "sha256": "8a417cc8a73e32723ff449b603412113f319c7447044e81acab3f57d4e8226c8" } }, ``` Changed the style to be more consistent: <img width="898" alt="image" src="https://user-images.githubusercontent.com/90178898/215492173-7362fab7-15e6-4de9-824b-239164512231.png"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
… to use upload_id (elastic#149575) ## Summary Closes elastic#141074 Enabled feature flag and tweaked implementation to find file by `upload_id` rather than doc id. How to test: - Start local kibana, start Fleet Server, enroll Elastic Agent from local (pull [these changes](elastic/elastic-agent#1703) ) - Click on Request Diagnostics action on the Agent - The diagnostics file should appear on Agent Details / Diagnostics tab. - The action should be completed on Agent activity <img width="1585" alt="image" src="https://user-images.githubusercontent.com/90178898/214805187-2b1abe34-ba7e-4612-9fad-7ef1f5942f47.png"> <img width="745" alt="image" src="https://user-images.githubusercontent.com/90178898/214805997-20fdaa01-e4c5-461c-b395-1b1e43117f8a.png"> The file metadata and binary can be queried from these indices: ``` GET .fleet-files-agent/_search GET .fleet-file-data-agent/_search ``` Tweaked the implementation so that the pending actions are showing up as soon as the `.fleet-actions` record is created (it can take several minutes until the action result is ready) Plus added a tooltip for error status <img width="948" alt="image" src="https://user-images.githubusercontent.com/90178898/214841337-eacbb1fc-4934-4d8b-9d52-8db4502d2493.png"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
## Summary Changed query of diagnostics files to speed up seeing the files. This is because the agent has a delay of about 4m to ack the action, this has to be fixed separately, see here elastic/elastic-agent#1703 (comment) Related to elastic#141074 We can search for the diagnostics file by `agent_id` and `action_id`, so don't have to wait for the `upload_id` which comes from `.fleet-actions-results`. https://user-images.githubusercontent.com/90178898/215451881-bfaa9e86-e055-4490-87b1-dc1d1076a738.mov Displaying error from agent when diagnostics failed: <img width="839" alt="image" src="https://user-images.githubusercontent.com/90178898/215476207-5db7e935-28dd-432e-a6a6-195da162028a.png"> E.g. `.fleet-files-agent` ``` { "_index": ".fleet-files-agent-000001", "_id": "8a004559-0731-4b8f-b29e-d7405ca0d68c.3a1f21b3-4559-4d3f-aae0-58356c269a92", "_score": null, "_source": { "action_id": "8a004559-0731-4b8f-b29e-d7405ca0d68c", "agent_id": "3a1f21b3-4559-4d3f-aae0-58356c269a92", "contents": null, "file": { "ChunkSize": 4194304, "Status": "READY", "ext": "zip", "hash": { "md5": "", "sha256": "" }, "mime_type": "application/zip", "name": "elastic-agent-diagnostics-2023-01-30T10-13-33Z-00.zip", "size": 577178 }, "src": "agent", "upload_id": "988da8ad-9d92-4d18-b5b0-b2a7e77f5a81", "upload_start": 1675073615066, "transithash": { "sha256": "8a417cc8a73e32723ff449b603412113f319c7447044e81acab3f57d4e8226c8" } }, ``` Changed the style to be more consistent: <img width="898" alt="image" src="https://user-images.githubusercontent.com/90178898/215492173-7362fab7-15e6-4de9-824b-239164512231.png"> ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Hi Team, We have executed 11 testcases under Feature test run for 8.7.0 release at link: Status:
Build details: As the testing is completed on this feature, we are marking this as QA:Validated. Please let us know if anything else is required from our end. |
We have had to remove the ILM policies from the index templates in 8.7.0 due to an issue, ore detail here #153483 |
We should write a docs issue and prepare some draft documentation for this feature. Support and our end users would greatly appreciate some references to this functionality in our troubleshooting docs. |
@juliaElastic Could you file a docs issue for this? (cc @karenzone) |
Blocked by https://github.com/elastic/security-team/issues/4661
Background
A common supportability concern with Fleet/Agent is the collection and investigation of diagnostics. Elastic Agent exposes the
elastic-agent diagnostics collect
command, which outputs a.zip
containing various diagnostics information that's crucial for debugging purposes.We'd like to expose these diagnostics files in Fleet UI to improve debug-ability and reduce support overhead when requesting these diagnostics.
There are a few components at play here
UPLOAD_DIAGNOSTICS
(name not final) action type for initiating the collection -> upload of Agent diagnosticsImplementation
REQUEST_DIAGNOSTICS
action type in the existingPOST /api/fleet/agents/<id>/action
APIGET /api/fleet/agents/<id>/uploads
GET /api/files/files/<id>/blob[/<filename>]
/agents/:id
REQUEST_DIAGANOSTICS
action and navigates the user to a new "Diagnostics" tab on the agent details pageFile
andDate
/agents
Demo
Designs
Overview:
Show individual screens
Agent details screen:
Diagnostics tab:
Agent listing page
The text was updated successfully, but these errors were encountered: