Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Request diagnostics #142369

Merged
merged 41 commits into from
Nov 10, 2022

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Sep 30, 2022

Summary

Closes #141074

Request diagnostics action

Added new action for single agent (Agent details page and Agent list row actions) to request diagnostics.
When clicking on the action, an API request is made that creates a REQUEST_DIAGNOSTICS type action in .fleet-actions index.

Diagnostics uploads display

When the action is submitted, the user is navigated to the new Agent Details / Diagnostics tab, which shows the list of pending and completed diagnostics file uploads. The information is coming from the /action_status (for action status) as well as the /uploads endpoint (for file name and path)
By clicking on a diagnostics link, the file should be downloaded in zip.

image

Failed uploads display:
image
Expired status was not specified in the design separately, it will be shown like the failed status (with warning icon).

Mock data (blocker)

Currently returning mock data in the /uploads API, because of a blocker in Kibana File Service, see here.

Bulk action

Added bulk action too:
image

Shows up in agent activity:
image

The Fleet Server / Agent changes are not there yet, though FS delivers the action, and Agents ack it (looks like default behavior for unkown actions as well)

Confirmation modal

Added a confirmation modal when clicking on action button everywhere, except for the Request diagnostics button on the Diagnostics page.
Open question:

  • Do we want to display the confirmation window on the Diagnostics page button too?

image

Download

Generated file path to download in this format: /api/fleet/agents/files/{fileId}/{fileName}

Decided not to try to use files plugin's API because it doesn't have the Fleet authorization around it.

Screen recording demonstrating the download of an agent diagnostics zip file, that I uploaded using the Fleet Server upload API (using Dan's pr locally)

file_download.mov

Notification

Added toast message to show up when a diagnostics becomes ready, when we are on the Diagnostics tab.

diag_notif.mov

Checklist

Delete any items that are not applicable to this PR.

@juliaElastic juliaElastic added release_note:feature Makes this part of the condensed release notes v8.6.0 labels Sep 30, 2022
@juliaElastic juliaElastic self-assigned this Sep 30, 2022
@juliaElastic juliaElastic changed the title Feat/request diagnostics [WIP] Request diagnostics Sep 30, 2022
@juliaElastic juliaElastic marked this pull request as ready for review October 5, 2022 08:14
@juliaElastic juliaElastic requested a review from a team as a code owner October 5, 2022 08:14
@juliaElastic juliaElastic changed the title [WIP] Request diagnostics [Fleet] Request diagnostics Oct 5, 2022
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 5, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover
Copy link
Contributor

@juliaElastic super excited to see so much progress on this already. Do we need to wait for the Fleet Server and Agent dependencies to be merged before we merge this?

@juliaElastic
Copy link
Contributor Author

@joshdover yes, we need the backend implementation of the action, otherwise it would be an action that never completes.
Also we need a real implementation of the uploads api which is blocked currently.

@juliaElastic
Copy link
Contributor Author

@elasticmachine merge upstream

@juliaElastic juliaElastic force-pushed the feat/request-diagnostics branch from 77ee9bf to f165ee7 Compare October 7, 2022 13:32
@juliaElastic
Copy link
Contributor Author

juliaElastic commented Nov 8, 2022

ResponseOps changes look good!

We're chatting about this PR right now, since we don't think we've reviewed your usage of task manager - but can now see it when new task types get added (via the check in check_registered_task_types.ts. Thinking it would be good to get an overview of how you're using it, we may have some thoughts/hints/tips.

@pmuellr See the reasoning here: #138870
To summarize, we have Fleet bulk actions that are being run in async mode for more than 10k agents. The Kibana Task Manager is used to catch errors and trigger retry of actions in a Task, and also to check completion in a Task after 5m.
The same logic is used for all types of actions (upgrade, unenroll, reassign, update tags, request diagnostics).

@@ -15,6 +15,7 @@ export const allowedExperimentalValues = Object.freeze({
createPackagePolicyMultiPageLayout: true,
packageVerification: true,
showDevtoolsRequest: true,
showRequestDiagnostics: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added feature flag to hide Request diagnostics action by default at 3 places: Agent list single action, bulk action, Agent details action.
For local testing, set this boolean to true.

Feature flag off:
image
image
image

Toggle on:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

@nchaulet nchaulet self-requested a review November 8, 2022 14:02
@kpollich kpollich self-requested a review November 8, 2022 14:20
return actionResults.hits.hits.map((hit) => ({
actionId: hit._source?.action_id as string,
timestamp: hit._source?.['@timestamp'],
fileId: hit._source?.data?.file_id as string,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking the file_id from action results, this part is not yet implemented, asked here: elastic/elastic-agent#1631 (comment)

>
<FormattedMessage
id="xpack.fleet.requestDiagnostics.calloutText"
defaultMessage="Diagnostics files are stored in Elasticsearch, and as such can incur storage costs. Fleet will automatically remove old diagnostics files after 30 days."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callout text might have to be updated if we agreed on a different max age then 30 days.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
fleet 733 735 +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 898 912 +14

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 861.4KB 868.8KB +7.4KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 115.5KB 116.9KB +1.4KB
Unknown metric groups

API count

id before after diff
fleet 1001 1015 +14

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 440 446 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 67 73 +6
osquery 109 115 +6
securitySolution 517 523 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

Copy link
Contributor

@criamico criamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the branch locally, I'm not sure if the upload part can be tested yet but up to the point works as per requirements.
In my local I see that the UI gets stuck in loading state when requesting the diagnostic, so it's difficult to test the rest of the flow.

I just have a question. From the agent activity is possible to see that the diagnostics where requested, but it's hard to know for which agent. It's totally fine in case of bulk requests, but would it make sense to consider adding the agent id for single requests?

Screenshot 2022-11-09 at 16 23 41

Copy link
Contributor

@criamico criamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM 🚢

@@ -127,6 +127,8 @@ export const AGENT_API_ROUTES = {
BULK_UNENROLL_PATTERN: `${API_ROOT}/agents/bulk_unenroll`,
REASSIGN_PATTERN: `${API_ROOT}/agents/{agentId}/reassign`,
BULK_REASSIGN_PATTERN: `${API_ROOT}/agents/bulk_reassign`,
REQUEST_DIAGNOSTICS_PATTERN: `${API_ROOT}/agents/{agentId}/request_diagnostics`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the new endpoints to the docs? It can be done in a separate PR as well

@juliaElastic
Copy link
Contributor Author

I tested the branch locally, I'm not sure if the upload part can be tested yet but up to the point works as per requirements. In my local I see that the UI gets stuck in loading state when requesting the diagnostic, so it's difficult to test the rest of the flow.

Thanks for the review!
I can document the steps how I tested e2e locally (checking out the fleet-server pr for the upload API, upload a test file with a script, adding some code to kibana that immediately saves the action result with the file id when taking the action).

I just have a question. From the agent activity is possible to see that the diagnostics where requested, but it's hard to know for which agent. It's totally fine in case of bulk requests, but would it make sense to consider adding the agent id for single requests?

We have an enhancement to add more insight into the agents actioned: #141206

I think in Request Diagnostics case this shouldn't be an issue, because when taking the action on a single agent, the UI should navigate to Agent Details / Diagnostics tab.

@juliaElastic
Copy link
Contributor Author

juliaElastic commented Nov 10, 2022

Here are the steps how I tested e2e locally:

make local
./bin/fleet-server --config fleet-server.dev.yml 2>&1
  • upload a test file with a script: extract this zip to your fleet-server directory, run go run ./upload.go elastic-agent-diagnostics-2022-10-04T09-54-34Z-00.zip
    upload_test.zip

  • add some code to kibana that immediately saves the action result with the file id when taking the action:
    replace requestDiagnostics function in x-pack/plugins/fleet/server/services/agents/request_diagnostics.ts:

Show code
export async function requestDiagnostics(
  esClient: ElasticsearchClient,
  agentId: string
): Promise<{ actionId: string }> {
  const id = new Date().toISOString();
  const response = await createAgentAction(esClient, {
    agents: [agentId],
    created_at: new Date().toISOString(),
    type: 'REQUEST_DIAGNOSTICS',
    id,
  });
  await bulkCreateAgentActionResults(
    esClient,
    [agentId].map((agent) => ({
      agentId: agent,
      actionId: id,
      data: {file_id: 'a59cdf67-65d9-4070-90ee-6b7c488215aa.b324764f-3631-40a5-8ff1-756fbbfbc3ac'} // replace with the actual file id, you can check this in console `GET .fleet-agent-files/_search`
    }))
  );
  return { actionId: response.id };
}
  • add data parameter to bulkCreateAgentActionResults in actions.ts:
Show code
export async function bulkCreateAgentActionResults(
  esClient: ElasticsearchClient,
  results: Array<{
    actionId: string;
    agentId: string;
    error?: string;
    data?: any;
  }>
): Promise<void> {
  if (results.length === 0) {
    return;
  }

  const bulkBody = results.flatMap((result) => {
    const body = {
      '@timestamp': new Date().toISOString(),
      action_id: result.actionId,
      agent_id: result.agentId,
      error: result.error,
      data: result.data,
    };

    return [
      {
        create: {
          _id: uuid.v4(),
        },
      },
      body,
    ];
  });

  await esClient.bulk({
    index: AGENT_ACTIONS_RESULTS_INDEX,
    body: bulkBody,
    refresh: 'wait_for',
  });
}
  • Go to Agent Details UI and trigger Request Diagnostics action. The file should immediately be available to download, and the uploaded zip file should be the same as you uploaded.

@juliaElastic juliaElastic merged commit c7cdd00 into elastic:main Nov 10, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Nov 10, 2022
@juliaElastic juliaElastic added release_note:skip Skip the PR/issue when compiling release notes and removed release_note:feature Makes this part of the condensed release notes labels Jan 12, 2023
@juliaElastic
Copy link
Contributor Author

NOTE: this feature is turned off in 8.6 as the backend is not ready yet. The feature is planned to be enabled in 8.7.

dedemorton added a commit that referenced this pull request Jan 12, 2023
Related to elastic/observability-docs#2506

The change for #142369 did not make it into the 8.6 release.
dedemorton added a commit that referenced this pull request Jan 12, 2023
…8842)

Related to elastic/observability-docs#2506

The change for #142369 did not make it into the 8.6 release.

## Summary

n/a

### Checklist

n/a
@amolnater-qasource
Copy link

Hi Team,

We have created 11 testcases for this feature under our Fleet test suite at link:

Please let us know if any other scenario is missing from our end.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Add workflow for requesting and downloading agent diagnostics from Fleet UI
9 participants