Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 agent diagnostics action #1631

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Oct 27, 2022

What does this PR do?

Add a diagnostics action type and handler to the agent.
The diagnostics action will upload the resulting archive to fleet-server using the new file-upload API (added as a part of the fleetapi module)
Move the archive (zip) assembly from cmd/diagnostics.go into the diagnostics module so that running diagnostics as a command and as an action will produce the same binary.
Changed diagnostics hooks and providers; providers now provide a callback function that can be used to dynamically generate a list of hooks. This was done so that log collection can be provided as hooks when the control server receives the request.
Changed the hook function to return []byte, time.Time so that file mod times can be provided when reading logs.

Why is it important?

In order to make it easier for customers to debug deployments we are adding the ability to gather diagnostics bundles in the fleet UI in Kibana.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Add a diagnostics action handler to the agent. This will allow a user to
collect a diagnostics bundle from the fleet-ui. Change log collection
from being done in the cmd/diagnostics.go file to dynamically generated
hooks. Change the control server from accepting a list of hooks to a
varadic list of callback functions that return hooks. This is to allow
the agent to dynamically generate a hook for each log (and service log
file) when the diagnostics action is triggered from an action or the
command line. Add a ZipArchive function to the diagnostics package that
will write the zip to the passed writer so that the archive can be
created by a command line invokation or from an action.
Change the Hook function from returning a []byte to a []byte, time.Time
so that the file mod times collected through diagnostics hooks can be
returned from the same filesystem interaction. Add the start of a
specific retrier for the fleetapi/uploader that will resend requests if
a 429 is recieved.
@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Oct 27, 2022
@mergify
Copy link
Contributor

mergify bot commented Oct 27, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b v2-agent-diagnostics-action upstream/v2-agent-diagnostics-action
git merge upstream/feature-arch-v2
git push upstream v2-agent-diagnostics-action

@michel-laterman
Copy link
Contributor Author

File upload PR is being worked on here: elastic/fleet-server#1902
The API may change so I will likely need to update this PR to reflect that

cc @pzl

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 28, 2022

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-10-31T17:15:17.680+0000

  • Duration: 13 min 38 sec

Steps errors 4

Expand to view the steps failures

Go build
  • Took 1 min 53 sec . View more details here
  • Description: mage build
Go build
  • Took 1 min 4 sec . View more details here
  • Description: mage build
Go build
  • Took 1 min 14 sec . View more details here
  • Description: mage build
Go build
  • Took 1 min 17 sec . View more details here
  • Description: mage build

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@juliaElastic
Copy link
Contributor

juliaElastic commented Nov 1, 2022

@michel-laterman Can we make sure to add the diagnostics file ID to the ActionResult data that is sent back on ack and saved to .fleet-action-results? We need that to query the file that corresponds to the triggering action on Fleet UI.

Something like this would be great:

{ "action_id": "<action_id>", "data": { "file_id": "<file_id>" }, ... }

@jlind23 jlind23 linked an issue Nov 3, 2022 that may be closed by this pull request
9 tasks
Err: diag.Err,
Results: files,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole code block looks to be a duplicate of what is in the control server code. We should place this code in a common place instead of copying it.

@mergify
Copy link
Contributor

mergify bot commented Nov 8, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b v2-agent-diagnostics-action upstream/v2-agent-diagnostics-action
git merge upstream/feature-arch-v2
git push upstream v2-agent-diagnostics-action

@blakerouse blakerouse deleted the branch elastic:feature-arch-v2 November 9, 2022 16:42
@blakerouse blakerouse closed this Nov 9, 2022
@michel-laterman michel-laterman mentioned this pull request Nov 9, 2022
6 tasks
@michel-laterman michel-laterman deleted the v2-agent-diagnostics-action branch April 12, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Support new "request diagnostics" action type
4 participants