Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Move to Sidecar-based Plugins #4042

Open
DerekTBrown opened this issue Jan 10, 2025 · 1 comment
Open

Proposal: Move to Sidecar-based Plugins #4042

DerekTBrown opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@DerekTBrown
Copy link
Contributor

Problem Statement

In utilizing various plugins, we have identified a few key limitations of the current plugin architecture:

1. Lack of Observability/Prometheus Metrics

Plugins are forked at startup from the main process. This makes them harder to observe:

  • There is not an interface for plugins to emit metrics via the standard Prometheus endpoint.
  • Plugins will not appear in pprof profiles generated using the argo-rollouts pprof endpoint.

Moreover, it takes a significant amount of re-configuration to enable metrics and pprof endpoints for plugins. The user needs to:

  1. Expose the metrics/pprof endpoints within the plugin server (as in any other application), likely configurable via
    command-line flags.
  2. Modify the argo-rollout deployment object to expose these ports.
  3. (In the case of monitoring) further modify the argo-rollout to scrape the port of all configured plugins.

2. Runtime download of HTTP Artifacts or forking the Docker image

Users have a choice between http and file locations for plugins (code). Neither one of these is ideal:

HTTP-based Plugins

  • http plugin locations create a reliability risk for the user, since they depend on arbitrary HTTP endpoints that may be sporadically unavailable.
  • This method of importing binaries isn't idiomatic, which doesn't play nicely with other ecosystem components:
    • Vulnerabilities in the plugin binary may slip past vuln scanning, since the binary is imported at runtime.
    • The plugin version is effectively specified in an arbitrary string. This makes it harder for automation (and engineers operating according to standard Helm/Docker practices) to identify that the version needs to be upgraded.

File-based Plugins

  • file plugin locations are difficult to implement for users, since they are responsible for placing these plugin binaries on the argo-rollouts filesystem.
    • Idiomatically, the user would maintain a Docker build pipeline that uses the open-source argo-rollouts image as a base, and adds the requisite plugins to the file-system, which.
    • Note: This will get somewhat easier if/when image volumes are GA-ed (VolumeSource: OCI Artifact and/or Image kubernetes/enhancements#4639), since plugins can be published as Docker images.

Proposal

argo-rollouts should slowly migrate from launching go-plugins as processes within the argo-rollouts container, to separate sidecar processes managed by Kubernetes. This has a number of benefits:

  1. Plugins can be distributed as standalone Docker images. This allows for idiomatic versioning and caching.

  2. Separate Plugin sidecars make the system easier-to-understand and debug for users:

    • Users can clearly identify which plugins are failing using standard Kubernetes tooling (i.e. viewing container exits, logs, etc).
    • Users can identify what is/isn't being effectively monitored.
      • For example, some plugins may have a separate monitoring endpoint, which users can see a PodMonitor for.
      • Alternatively, some plugins may lack quality monitoring, which would be clear to users.
    • Similarly, because ports can be exposed directly via the K8s APIs, users can easily identify debug/pprof endpoints.

This design could be implemented using Helm templates. Effectively, a user would import a plugin chart, which foundationally exports a library template. This template could then be invoked to add the sidecar configuration.

Alternatives Considered

The alternative is likely a combination of several fixes to each point problem:

1. Lack of Observability/Prometheus Metrics

Prometheus

  • Alternatively, go-plugin or argo-rollouts could be extended to provide a metrics interface between the main process and plugin components.

pprof

  • Each plugin could be expected to implement its own pprof endpoint.
  • The central argo-rollouts pprof endpoint could then proxy into the plugin endpoints.

2. Runtime download of HTTP Artifacts

[Option 1]: Mega Docker Image

  • The argo-rollouts project could have a central pipeline that exports a "mega" Docker image containing all "well-known" plugins for users to utilize. This would mitigate the risks of using HTTP directly.

[Option 2]: Image Volumes
As mentioned previously, once GA, Argo Rollouts could move to using Image Volumes as the default model for plugin distribution.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@DerekTBrown DerekTBrown added the enhancement New feature or request label Jan 10, 2025
@DerekTBrown
Copy link
Contributor Author

Alternatively, I think it makes sense to discuss abandoning go-plugin in favor of internalizing the plugins within the codebase. There are only a handful of maintained plugins, and it seems like there are significant benefits to having a singular codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants