Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "content-only" integrations #351

Closed
7 tasks done
Tracked by #179907
jsoriano opened this issue Jun 3, 2022 · 15 comments
Closed
7 tasks done
Tracked by #179907

Support "content-only" integrations #351

jsoriano opened this issue Jun 3, 2022 · 15 comments
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team

Comments

@jsoriano
Copy link
Member

jsoriano commented Jun 3, 2022

After a conversation about #346 with @ruflin we came to the conclusion that we may want to have a package type for additional content, something as DLCs for integration packages.

This package type would only contain assets, and references to an integration package. Packages of this type could be only installed if the referenced integration package is installed.

Use cases for these packages:

  • Potentially large sample data.
  • Optional assets, that may be also large, and may have different licenses or require different subscriptions, as ML models.
  • Additional dashboards, for specific use cases, and/or maintained by other teams or the community.
  • Assets like dashboards or ML jobs for data collected by OTEL or other collectors.

Data streams, or anything that alters ingestion of data would be excluded from these packages, to avoid complex dependencies and interactions.

Some notes about this:

  • From the spec perspective, these packages would be like integration packages, but without data_streams or anything related to ingestion, and with a dependency to the integration package that collects the data. The dependency would need to be with package name and version range.
  • Fleet would install these packages as assets of the main package. If the main package is uninstalled, packages with additional content should be uninstalled too.
  • From the package search POV, these packages could be found:
    • From the main package view. For example from the nginx package view, you can also discover "content" packages for nginx.
    • From the package search view, you can also find these packages, and when installing them, the referenced integration is installed too.

Tasks

@jsoriano jsoriano added the discuss Issue needs discussion label Jun 3, 2022
@jlind23 jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Jun 3, 2022
@felixbarny
Copy link
Member

with a dependency to the integration package that collects the data

Just the ability to define such a dependency would be a very useful feature on its own. We have a similar use case in APM, where we would like a Java attacher integration to be able to declare a dependency on the APM integration.

cc @eyalkoren @Mpdreamz

@jsoriano
Copy link
Member Author

jsoriano commented Jun 8, 2022

Java attacher integration

What kind of content would these attacher integrations contain? For this proposal I am excluding in principle anything related to data ingestion.

Also, we don't want to open the door to dependencies in a general sense, the dependencies here would be only between the content package and the package it extends.

@Mpdreamz
Copy link
Member

Mpdreamz commented Jun 9, 2022

In our case it would be a binary that elastic-agent needs to supervise. See https://github.com/elastic/ingest-dev/issues/982 for a discussion on why we feel this responsibility should not fall on the main package.

Also, we don't want to open the door to dependencies in a general sense, the dependencies here would be only between the content package and the package it extends.

++ fully agree, package management is hard 😄

That's why the design doc our feature request around this topic explicitly named them subpackages. A sub package should only ever belong to one main package.

@jsoriano
Copy link
Member Author

jsoriano commented Jun 9, 2022

Well, I think that distributing binaries that elastic-agent executes is a different and broader discussion 🙂 I would leave this out of this proposal.

I guess that packages with collector binaries will also need to include information, like mappings, about the additional data these binaries collect, this is something I am explicitly excluding here.

Also the dependencies direction of this kind of packages could be different. If we have packages to distribute collector binaries, we could have a "collector" package for Metricbeat, and all integration packages with metrics collected by Metricbeat would depend on it.

@ruflin
Copy link
Member

ruflin commented Nov 30, 2023

This topic popped up in the context of a potential stream command in elastic-package that allows you to ship synthetics data for any package in the registry to your cluster: elastic/elastic-package#1541 The problem, the files under _dev are not part of the package served by the registry, same for testdata directories and other. These files are removed during the build process (couldn't find the code quickly) which makes sense as it would increase the package size.

The simplest scenario I could see here is that during the publishing process, there is a elastic-package build and and elastic-package build -raw that copies over all the files. This would result in a package like kubernetes-1.55.0-raw.zip which would could also be downloaded from the registry but is not directly used by Fleet itself (at least not by default) but is useful for development.

@jsoriano
Copy link
Member Author

The simplest scenario I could see here is that during the publishing process, there is a elastic-package build and and elastic-package build -raw that copies over all the files. This would result in a package like kubernetes-1.55.0-raw.zip which would could also be downloaded from the registry but is not directly used by Fleet itself (at least not by default) but is useful for development.

I think it could make sense to build and publish source packages, but I would consider this a different issue. This is a common practice in other open source packaging systems, I have created an issue to support that: elastic/elastic-package#1577

Another option would be to include the build information, something we also want to do (#446). This would allow to find the source files in the source repository, but would indeed be an additional step.

@kpollich kpollich changed the title [Discuss][Change Proposal] Add a package type for additional content [Discuss][Change Proposal] Add a package type for "assets only" integrations Jun 27, 2024
@kpollich
Copy link
Member

Bumping this as it'll be quite important as part of the OTel integrations project, but also for things like https://github.com/elastic/integrations/tree/main/packages/security_detection_engine which has its own set of issues with installation due to its size. Having it designated as a content-only integration and going through a different installation mechanism more optimized for large integrations with many assets will be a positive change overall for Fleet's memory footprint and the overall UX of integrations like this.

@kpollich kpollich changed the title [Discuss][Change Proposal] Add a package type for "assets only" integrations [Status] Support "content-only" integrations Jul 2, 2024
@kpollich
Copy link
Member

kpollich commented Jul 2, 2024

Tweaking the title + labels so this appears properly on the OTel board as a milestone

@jsoriano
Copy link
Member Author

jsoriano commented Jul 2, 2024

Thinking on the OTEL use case, we should probably relax the restrictions on dependencies with integration packages. With OTEL there may be no integration package that collects the data.

@kpollich kpollich changed the title [Status] Support "content-only" integrations Support "content-only" integrations Jul 15, 2024
@kpollich
Copy link
Member

We should look to implement a separate installation path in Kibana that's optimized for content-only integrations. We already have a use case for this with large packages like https://github.com/elastic/integrations/tree/main/packages/security_detection_engine that run into memory issues when bulk deleting/importing assets during the existing installation process.

@xcrzx has been doing great work on improving the memory pressure during package installation for the rules package here elastic/kibana#187969, but these are only incremental improvements that don't necessarily address the root cause in the long term.

For content-only integrations, I think we could optimize Fleet's installation code to avoid potentially expensive operations in a situation where a package has many assets.

@jsoriano
Copy link
Member Author

Initial definition for content packages merged in the spec #777. Planned to be released as beta in 3.4.0.

Next steps will be to prepare support in elastic-package and the Package Registry. And eventually add support to distribute more kinds of assets and resources.

@kpollich
Copy link
Member

I think we can consider this done because the UI work is being tracked separately in elastic/kibana#192484.

Support for discovery features in package-registry.

@mrodm is there anything left to do here to expose the discovery properties in EPR? An issue to implement discovery in Kibana is part of the requirements in the above UI issue, and I don't think we have anything left to implement here.

Next steps will be to prepare support in elastic-package and the Package Registry. And eventually add support to distribute more kinds of assets and resources.

I created #803 as a follow-up to support more asset types.

With follow-up issues created for the remaining scope I'm closing this. Our initial support for content packages is implemented and well tested. Thanks all!

@mrodm
Copy link
Contributor

mrodm commented Sep 19, 2024

Support for discovery features in package-registry.

@mrodm is there anything left to do here to expose the discovery properties in EPR? An issue to implement discovery in Kibana is part of the requirements in the above UI issue, and I don't think we have anything left to implement here.

@kpollich It would be missing to add support in Elastic Package Registry to show/search packages according to discovery features. Support requests in EPR like

GET /search?discovery=fields:process.pid,user.id

to return packages that can leverage documents that include the process.pid or the user.id fields

cc @jsoriano

@kpollich
Copy link
Member

Got it - thanks for clarifying + creating that issue, Mario. I'm not sure yet what the priority is on the discovery feature here, but we will clarify that soon 🙂

@mrodm
Copy link
Contributor

mrodm commented Sep 19, 2024

Got it - thanks for clarifying + creating that issue, Mario. I'm not sure yet what the priority is on the discovery feature here, but we will clarify that soon 🙂

@kpollich just created an issue for that elastic/package-registry#1229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

7 participants