[Security Solution][Detections] Fix fetching package info from registry for installed integrations #134732

banderror · 2022-06-20T10:37:55Z

Fixes: #134639

Summary

In Cloud, Elastic APM and Fleet Server integrations are installed by default. However, attempts to fetch their packages from Elastic Package Registry via Fleet services on the server-side fail with the following errors:

{
    "message": "[email protected] not found",
    "status_code": 500
}

{
    "message": "[email protected] not found",
    "status_code": 500
}

This behavior happens in some Cloud environments (like the one in the related ticket). It seems to not happen in Cloud CI environments and locally.

This PR adds error handling for this edge case to GET /internal/detection_engine/fleet/integrations/installed?packages= endpoint.

It logs fetching errors to the console logs of Kibana.
It uses a "best-effort" approach for returning data from the endpoint. If we could successfully read existing integration policies, we already have all of the needed data except correct integration titles. So, if after that any request to EPR results in an error, we:
- Still return 200 with a list of installed integrations
- Include correct titles for those packages that were successfully fetched
- Include "best guess" titles for those packages that failed

[2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for 
[email protected]. Boom!
[2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for 
[email protected]. Boom!

Checklist

Unit or functional tests were updated or added to match the most common scenarios

elasticmachine · 2022-06-20T10:47:59Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine · 2022-06-20T10:48:01Z

Pinging @elastic/security-solution (Team: SecuritySolution)

banderror · 2022-06-20T13:33:02Z

So I tested it in the Cloud CI deployment. There seem to be no errors when fetching Elastic APM and Fleet Server integrations on the BE. This is different from the behavior described in #134639.

Notice the "correct" Elastic APM Integration title. It indicates that the package data with this title was fetched from EPR. If there was a fetch error, it would be the "best guess" title Elastic APM Apmserver (see the PR description).

spong · 2022-06-20T16:21:55Z

...detection_engine/routes/fleet/get_installed_integrations/get_installed_integrations_route.ts

-          set.getPackages().map((packageInfo) => {
-            return fleet.packages.getRegistryPackage(
+        const registryPackages = await initPromisePool({
+          concurrency: MAX_CONCURRENT_REQUESTS_TO_PACKAGE_REGISTRY,


Thanks for adding a pool and capping the requests here! I think we discussed this in the initial PR for this endpoint, but I'd be curious to see what the upper bounds are here for large deployments leveraging integrations. Would be good to test with a large number of integration policies, and then also with a large number of installed integrations.

One issue we have with fetching package policies is we're not handling pagination, so we'll cap out at the max for that initial page. Then for fetching the individual packages, I'm a little worried if there's say 100+ installed packages and we end up making 100+ requests to fleet each time an individual user hits the Rules Mgmt or Details pages. We do have caching client-side through the react-query wrapper, but nothing here server side, so could get messy if there's a few different users bouncing around those pages.

If this becomes an issue in production, users can at least disable this feature and short-circuit this request on the Rule Mgmt page w/ the Kibana Advanced Setting, but definitely something we'll want to test further. Will be good to get feedback from the fleet folks about any other API's we might be able to use here to ease the pressure on the fleet side -- would be great if we could just get all installed packages in one request instead of going the integration policy route.

Would be good to test with a large number of integration policies, and then also with a large number of installed integrations.

Agree, will do it 👍

One issue we have with fetching package policies is we're not handling pagination, so we'll cap out at the max for that initial page.

I'll check how this method works when the perPage is not specified. I thought it just returns all the policies, but it might also set a default page size instead. Anyway, definitely need to test this 👍

I'm a little worried if there's say 100+ installed packages and we end up making 100+ requests to fleet each time an individual user hits the Rules Mgmt or Details pages. We do have caching client-side through the react-query wrapper, but nothing here server side

FWIW Fleet service caches fetched packages on the server-side. So only the first request may be slow (it depends linearly on the number of installed packages). The subsequent requests should be fast and not generate outgoing HTTP requests to EPR. I'm not sure 100% about this though -- so will test it as well. 👍

I will also test it with a large number of installed packages and a large number of integration policies.

Will be good to get feedback from the fleet folks about any other API's we might be able to use here to ease the pressure on the fleet side -- would be great if we could just get all installed packages in one request instead of going the integration policy route.

I'll open a discussion issue and share it with Fleet folks. 👍

Thank you @spong, this is a lot of great points!

FWIW Fleet service caches fetched packages on the server-side. So only the first request may be slow (it depends linearly on the number of installed packages). The subsequent requests should be fast and not generate outgoing HTTP requests to EPR. I'm not sure 100% about this though -- so will test it as well. 👍

This makes me feel a lot better about the upper bound cases. I didn't realize the fleet service cached as well, thanks @banderror!

And sounds good with the other points as well, thank you for verifying here 👍

spong · 2022-06-20T16:24:52Z

...detection_engine/routes/fleet/get_installed_integrations/get_installed_integrations_route.ts

+          const logMessage = `Error fetching package info from registry for ${item.package_name}@${item.package_version}`;
+          const logReason = error instanceof Error ? error.message : String(error);
+          logger.error(`${logMessage}. ${logReason}`);


While nice we're logging these now, I'm curious how actionable this will be for users, and potential overall noisiness of these logs as well. Perhaps debug may be a better log level for now?

@spong I think it depends on the error. Something like [email protected] not found as in the GH issue probably wouldn't be actionable for users. But if a user has a custom package registry, and it's not available, it would likely end up with an exception here and let the user know that they need to fix their environment.

It could be noisy, I agree. I think we have a similar situation with the implementation of IRuleExecutionLogForRoutes. There, we don't know for sure what exceptions we might catch (it could be a network error, a bug in the underlying code of SO Client or Event Log Reader, or a business error), but we log them via logger.error. Would you rather log them on the debug level?

I don't have a strong opinion on this, both options are ok to me.

@spong I think it depends on the error. Something like [email protected] not found as in the GH issue probably wouldn't be actionable for users. But if a user has a custom package registry, and it's not available, it would likely end up with an exception here and let the user know that they need to fix their environment.

Yeah, that's a good point. I was just thinking from the standpoint of linking this error back to the related integrations feature so there was some context as to where this was bubbling up from.

This was the error I saw in testing:

and was just thinking adding some context like Unable to retrieve required integrations. Error fetching package info from registry for test@7. whoops may be helpful.

As for the log-level, no strong opinions here either, but was just trying to think if there's a common scenario where this could be spammy. I'm fine leaving as-is, and we can just check up on things post-release to see what the impact looks like. 👍

Improved console logs a bit in d6a6a46

Errors should be less noisy but still visible to users by default. Added Unable to retrieve installed integrations for more context.

[2022-06-20T21:03:44.551+02:00][ERROR][plugins.securitySolution] Unable to retrieve installed integrations. Error fetching packages from registry: [email protected], [email protected]. [2022-06-20T21:03:44.551+02:00][DEBUG][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom! [2022-06-20T21:03:44.551+02:00][DEBUG][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom!

spong

Checked out, tested locally and LGTM! 👍

Left a couple comments around scale testing and a known bug with pagination, but this specific change LGTM with regards to fixing issues when errors are returned from fleet.packages.getRegistryPackage(), so I'm all good for merging and getting this extra guard in. Note: I was not able to do a full e2e test as I'm not sure of the scenarios where fleet would return errors here (still waiting to hear back from QA on any specific config they had that was causing this), but I did repro by throwing an error manually and everything looked good as a result of that.

dplumlee

lgtm!

kibana-ci · 2022-06-20T20:07:15Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

💚 Build #51815 succeeded 58bb62cd4d8f2beea7bdad1500a06f123122b6d3
💔 Build #51786 failed be59ee5aed3d737509b7f0efea1e11c383442d38

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

…ry for installed integrations (#134732) **Fixes:** #134639 ## Summary In Cloud, `Elastic APM` and `Fleet Server` integrations are installed by default. However, attempts to fetch their packages from Elastic Package Registry via Fleet services on the server-side fail with the following errors: ```json { "message": "[email protected] not found", "status_code": 500 } ``` ```json { "message": "[email protected] not found", "status_code": 500 } ``` <img width="797" alt="Screenshot 2022-06-20 at 11 28 18" src="https://user-images.githubusercontent.com/7359339/174571610-4c24e777-c49a-49e0-addf-54c6301cc8ca.png"> This behavior happens in some Cloud environments (like the one in the related ticket). It seems to not happen in Cloud CI environments and locally. This PR adds error handling for this edge case to `GET /internal/detection_engine/fleet/integrations/installed?packages=` endpoint. - It logs fetching errors to the console logs of Kibana. - It uses a "best-effort" approach for returning data from the endpoint. If we could successfully read existing integration policies, we already have all of the needed data except correct integration titles. So, if after that any request to EPR results in an error, we: - Still return 200 with a list of installed integrations - Include correct titles for those packages that were successfully fetched - Include "best guess" titles for those packages that failed ``` [2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom! [2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom! ``` <img width="1085" alt="Screenshot 2022-06-20 at 13 05 08" src="https://user-images.githubusercontent.com/7359339/174588468-d28c1383-3a25-4f16-8905-bad3ca73e63e.png"> ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios (cherry picked from commit cdcb272)

kibanamachine · 2022-06-20T20:10:50Z

💚 All backports created successfully

Status	Branch	Result
✅	8.3

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…ry for installed integrations (#134732) (#134784) **Fixes:** #134639 ## Summary In Cloud, `Elastic APM` and `Fleet Server` integrations are installed by default. However, attempts to fetch their packages from Elastic Package Registry via Fleet services on the server-side fail with the following errors: ```json { "message": "[email protected] not found", "status_code": 500 } ``` ```json { "message": "[email protected] not found", "status_code": 500 } ``` <img width="797" alt="Screenshot 2022-06-20 at 11 28 18" src="https://user-images.githubusercontent.com/7359339/174571610-4c24e777-c49a-49e0-addf-54c6301cc8ca.png"> This behavior happens in some Cloud environments (like the one in the related ticket). It seems to not happen in Cloud CI environments and locally. This PR adds error handling for this edge case to `GET /internal/detection_engine/fleet/integrations/installed?packages=` endpoint. - It logs fetching errors to the console logs of Kibana. - It uses a "best-effort" approach for returning data from the endpoint. If we could successfully read existing integration policies, we already have all of the needed data except correct integration titles. So, if after that any request to EPR results in an error, we: - Still return 200 with a list of installed integrations - Include correct titles for those packages that were successfully fetched - Include "best guess" titles for those packages that failed ``` [2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom! [2022-06-20T12:57:10.270+02:00][ERROR][plugins.securitySolution] Error fetching package info from registry for [email protected]. Boom! ``` <img width="1085" alt="Screenshot 2022-06-20 at 13 05 08" src="https://user-images.githubusercontent.com/7359339/174588468-d28c1383-3a25-4f16-8905-bad3ca73e63e.png"> ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios (cherry picked from commit cdcb272) Co-authored-by: Georgii Gorbachev <[email protected]>

banderror · 2022-06-22T12:29:54Z

@spong Yesterday I noticed the same type of errors, but locally. For example, right now I can see this in the logs (my changes are rebased on top of the latest main):

[2022-06-22T14:01:34.683+02:00][ERROR][plugins.securitySolution] Unable to retrieve installed integrations. Error fetching packages from registry: [email protected].
[2022-06-22T14:01:34.684+02:00][DEBUG][plugins.securitySolution] Error fetching package info from registry for [email protected]. [email protected] not found

This is how it looks in the UI.

It feels like whether an error will be thrown or not depends on the requested package version. Maybe. It's weird because the Fleet page itself says that [email protected] is installed, so how could it be not found??

I think there's still some kind of a bug here. I will open a separate ticket for it.

spong · 2022-06-23T00:06:56Z

Yeah, something interesting here... I was able to repro by installing an older version of a package ([email protected]) and got the error in the kibana log the first time, but haven't seen it since. Tried adding another older version package and no error log for that one. Restarted with debugger attached and still not able to reproduce the error from the fleet side.

banderror force-pushed the fix-fetching-package-info-for-installed-integrations branch from 6b3aab9 to be59ee5 Compare June 20, 2022 10:40

banderror self-assigned this Jun 20, 2022

banderror marked this pull request as ready for review June 20, 2022 10:47

banderror requested review from a team as code owners June 20, 2022 10:47

banderror force-pushed the fix-fetching-package-info-for-installed-integrations branch from be59ee5 to 58bb62c Compare June 20, 2022 11:53

banderror mentioned this pull request Jun 20, 2022

[Security Solution] [Inconsistently] Statuses like uninstall, install not displayed when import the rule with related integration #134639

Closed

spong reviewed Jun 20, 2022

View reviewed changes

spong approved these changes Jun 20, 2022

View reviewed changes

dplumlee approved these changes Jun 20, 2022

View reviewed changes

Fix fetching package info from registry for installed integrations

fa76212

banderror force-pushed the fix-fetching-package-info-for-installed-integrations branch from 58bb62c to 068ad21 Compare June 20, 2022 19:05

Improve console logging

d6a6a46

banderror force-pushed the fix-fetching-package-info-for-installed-integrations branch from 068ad21 to d6a6a46 Compare June 20, 2022 19:06

banderror enabled auto-merge (squash) June 20, 2022 19:14

banderror merged commit cdcb272 into elastic:main Jun 20, 2022

kibanamachine mentioned this pull request Jun 20, 2022

[8.3] [Security Solution][Detections] Fix fetching package info from registry for installed integrations (#134732) #134784

Merged

banderror deleted the fix-fetching-package-info-for-installed-integrations branch June 20, 2022 22:43

banderror linked an issue Jun 20, 2022 that may be closed by this pull request

[Security Solution] [Inconsistently] Statuses like uninstall, install not displayed when import the rule with related integration #134639

Closed

spong mentioned this pull request Jul 7, 2022

[Security Solution] Fix performance issues affecting rules management #135311

Merged

tylersmalley added ci:cloud-deploy Create or update a Cloud deployment and removed ci:deploy-cloud labels Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution][Detections] Fix fetching package info from registry for installed integrations #134732

[Security Solution][Detections] Fix fetching package info from registry for installed integrations #134732

banderror commented Jun 20, 2022 •

edited

Loading

elasticmachine commented Jun 20, 2022

elasticmachine commented Jun 20, 2022

banderror commented Jun 20, 2022 •

edited

Loading

spong Jun 20, 2022

banderror Jun 20, 2022

spong Jun 20, 2022

spong Jun 20, 2022

banderror Jun 20, 2022 •

edited

Loading

spong Jun 20, 2022

banderror Jun 20, 2022

spong left a comment

dplumlee left a comment

kibana-ci commented Jun 20, 2022

kibanamachine commented Jun 20, 2022

banderror commented Jun 22, 2022

spong commented Jun 23, 2022

[Security Solution][Detections] Fix fetching package info from registry for installed integrations #134732

[Security Solution][Detections] Fix fetching package info from registry for installed integrations #134732

Conversation

banderror commented Jun 20, 2022 • edited Loading

Summary

Checklist

elasticmachine commented Jun 20, 2022

elasticmachine commented Jun 20, 2022

banderror commented Jun 20, 2022 • edited Loading

spong Jun 20, 2022

Choose a reason for hiding this comment

banderror Jun 20, 2022

Choose a reason for hiding this comment

spong Jun 20, 2022

Choose a reason for hiding this comment

spong Jun 20, 2022

Choose a reason for hiding this comment

banderror Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

spong Jun 20, 2022

Choose a reason for hiding this comment

banderror Jun 20, 2022

Choose a reason for hiding this comment

spong left a comment

Choose a reason for hiding this comment

dplumlee left a comment

Choose a reason for hiding this comment

kibana-ci commented Jun 20, 2022

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Jun 20, 2022

💚 All backports created successfully

Questions ?

banderror commented Jun 22, 2022

spong commented Jun 23, 2022

banderror commented Jun 20, 2022 •

edited

Loading

banderror commented Jun 20, 2022 •

edited

Loading

banderror Jun 20, 2022 •

edited

Loading