-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Finer-grained error information from install/upgrade API #95649
Conversation
166f76f
to
716957a
Compare
@jen-huang after 716957a installing a broken package with a direct POST should correctly report errors and attempt to roll back, but the bulk install code path (which is also used by the setup code path) will still fail. Feel free to have a look, otherwise I'll just continue working on it. |
8bed5f9
to
3f217d8
Compare
Pinging @elastic/fleet (Feature:EPM) |
@jen-huang This is ready for review. If you don't have time to do all the manual tests, could you review the test scenarios in the initial description to check if I missed something important? @kevinlog This touches the bulk install functionality used by the security solution, who from your team could review this? |
@afgomez maybe you could have a look at this too, and maybe you find the test descriptions helpful for other EPM work. |
f1a8da4
to
f2243fc
Compare
f2243fc
to
9e9e11e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not run this locally but I am comfortable merging it based on the description, tests, and my understanding of the code.
Let's get this in before FF and keep 👁️ and👂open for any issues 🚀
9e9e11e
to
d3a4379
Compare
++ on getting this in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this will be really helpful moving forward
a198cb6
to
9fa2e24
Compare
💚 Build SucceededMetrics [docs]Unknown metric groupsAPI count
API count missing comments
History
To update your PR or re-run it, just comment with: |
…stic#95649) * Intercept installation errors and add meta info. * Adjust mock. * Catch errors in all steps of install/upgrade. * Adjust handler for direct package upload. * Don't throw not-found errors on assets during rollback. * Correctly catch errors from _installPackage() * Propagate error from installResult in bulk install case. * Add tests for rollback. * Remove unused code. * Skipping test that doesn't test what it says. * Fix and reenable test.
) (#97400) * Intercept installation errors and add meta info. * Adjust mock. * Catch errors in all steps of install/upgrade. * Adjust handler for direct package upload. * Don't throw not-found errors on assets during rollback. * Correctly catch errors from _installPackage() * Propagate error from installResult in bulk install case. * Add tests for rollback. * Remove unused code. * Skipping test that doesn't test what it says. * Fix and reenable test.
…te-legacy-es-client * 'master' of github.com:elastic/kibana: (102 commits) [Exploratory view] integerate page views to exploratory view (elastic#97258) Fix typo in license_api_guard README name and import http server mocks from public interface (elastic#97334) Avoid mutating KQL query when validating it (elastic#97081) Add description as title on tag badge (elastic#97109) Remove legacy ES client usages in `home` and `xpack_legacy` (elastic#97359) [Fleet] Finer-grained error information from install/upgrade API (elastic#95649) Rule registry bundle size (elastic#97251) [Partial Results] Move other bucket into Search Source (elastic#96384) [Dashboard] Makes lens default editor for creating new panels (elastic#96181) skip flaky suite (elastic#97387) [Asset Management] Agent picker follow up (elastic#97357) skip flaky suite (elastic#97382) [Security Solutions] Fixes flake with cypress tests (elastic#97329) skip flaky suite (elastic#97355) Skip test to try and stabilize master minimize number of so fild asserted in tests. it creates flakines when implementation details change (elastic#97374) [Search Sessions] Client side search cache (elastic#92439) [SavedObjects] Add aggregations support (elastic#96292) [Reporting] Remove legacy elasticsearch client usage from the reporting plugin (elastic#97184) [kbnClient] fix basePath handling and export reponse type (elastic#97277) ... # Conflicts: # x-pack/plugins/watcher/server/lib/license_pre_routing_factory/license_pre_routing_factory.ts # x-pack/plugins/watcher/server/plugin.ts # x-pack/plugins/watcher/server/routes/api/indices/register_get_route.ts # x-pack/plugins/watcher/server/routes/api/license/register_refresh_route.ts # x-pack/plugins/watcher/server/routes/api/register_list_fields_route.ts # x-pack/plugins/watcher/server/routes/api/register_load_history_route.ts # x-pack/plugins/watcher/server/routes/api/settings/register_load_route.ts # x-pack/plugins/watcher/server/routes/api/watch/action/register_acknowledge_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_activate_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_deactivate_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_delete_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_execute_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_history_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_load_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_save_route.ts # x-pack/plugins/watcher/server/routes/api/watch/register_visualize_route.ts # x-pack/plugins/watcher/server/routes/api/watches/register_delete_route.ts # x-pack/plugins/watcher/server/routes/api/watches/register_list_route.ts # x-pack/plugins/watcher/server/shared_imports.ts # x-pack/plugins/watcher/server/types.ts
Summary
Partially implements #91864
The approach, trying to stay minimally invasive, is:
_installPackage()
unchanged. If errors happen, they are thrown upwards.installPackageFromRegistry()
andinstallPackageByUpload()
, add atry / catch
block around the call to_installPackage()
. If errors are caught, they are not thrown, but added to theinstallResult
they return, along with the information whatinstallType
this operation was (install
,upgrade
etc.)InstallResult
is changedInstallResult
is returned frominstallPackage()
installPackage()
inspect the return value and throw any error they find, so that overall behavior isn't changed.This is all in preparation for callers being able to not re-throw the error they find, but report it back to the user in case they deem it non-fatal. This will happen in a second PR.
How to test this
In all tests, adjust
BASEPATH
to match your locally running system, or remove it. For reference,BASEPATH
in allcurl
commands below isrch
. (You can set this withserver.basePath: "/rch"
inkibana.dev.yml
)After the changes in this PR, the behavior in error situations should be exactly like before. Unfortunately, not all error scenarios are covered by our tests run in CI and need to be tested manually. Specifically, testing the
setup
API endpoints with broken required packages is not possible in CI, as we can have only one registry setup in CI (a combination of a docker container containing the registry and most packages, and the packages contained infleet_api_integration/api/fixtures
), and we need a workingsetup
for most other integration tests.Tests with
system
Craft a
system
package that triggers an error during installation, serve it from a locally running registry. Do do so, edit any dashboard insystem/0.10.9
to includeinstead of
7.3.0
.Use this to test the error during a plain installation, by calling
422 Unprocessable entity
Note that the rollback refuses to uninstall
system
because it is a required package.Now delete
system
withforce: true
like this:and verify it is uninstalled with
The response should contain
"status":"not_installed"
.Use the same broken
system
package as above.Install a previous, non-broken version of the package with
Then update the
system
package with a call toObserve the errors in the return value from the API call, and in the log. Verify that the rollback to the older version worked with another call to
and searching for the value of
status
, it should beinstalled
. Observe that"install_version":"0.10.9"
and"install_status":"installing"
, these are leftovers of the failed upgrade and subsequent rollback.Delete the
system
package again withNow use the same broken
system
package to test the setup and bulk install endpoints. Test the fresh install of the brokensystem-0.10.9
as well as the update from the non brokensystem-0.10.7
to the brokensystem-0.10.9
. The relevantcurl
commands are:Tests with
endpoint
One way to break the endpoint package is to make one of the ingest pipelines in the contained data streams invalid by changing
"ignore_failure": true
to"ignore_failures": true
in theprocessor
(or introducing any other syntax error here).Then open the UI and navigate directly to Security -> Overview. (Do NOT open the Fleet page first, as this would call the setup endpoints which would invalidate this test.)
Observe the call to the
/api/fleet/epm/packages/_bulk
endpoint in the network requests. It responds with200 OK
, and the response{"response":[{"name":"endpoint","statusCode":500,"error":"parse_exception"}]}
. This behavior is unchanged from before. (Theparse_exception
comes from the invalid ingest pipeline.)In the log, observe these errors:
(Specifically, note that Fleet again refuses to uninstall the
endpoint
package during rollback as it is a required package.)Additional tests would be to first install a non-broken earlier version of
endpoint
, then open the Security UI and verify that an attempt to update the package to the current, broken version was made and the rollback was successful. This is, however, already covered by testing the_bulk
endpoint in isolation as described above.Other possible tests
/epm/packages
route (optional, if you're only testing through the API withcurl
calls, as you can installsystem
like any other package before the UI triggers the setup)In addition to that, verify that normal functionality is not broken, i.e. try to break it in any way you can think of.