Proposal: Move to Sidecar-based Plugins #4042

DerekTBrown · 2025-01-10T01:38:55Z

Problem Statement

In utilizing various plugins, we have identified a few key limitations of the current plugin architecture:

1. Lack of Observability/Prometheus Metrics

Plugins are forked at startup from the main process. This makes them harder to observe:

There is not an interface for plugins to emit metrics via the standard Prometheus endpoint.
Plugins will not appear in pprof profiles generated using the argo-rollouts pprof endpoint.

Moreover, it takes a significant amount of re-configuration to enable metrics and pprof endpoints for plugins. The user needs to:

Expose the metrics/pprof endpoints within the plugin server (as in any other application), likely configurable via
command-line flags.
Modify the argo-rollout deployment object to expose these ports.
(In the case of monitoring) further modify the argo-rollout to scrape the port of all configured plugins.

2. Runtime download of HTTP Artifacts or forking the Docker image

Users have a choice between http and file locations for plugins (code). Neither one of these is ideal:

HTTP-based Plugins

http plugin locations create a reliability risk for the user, since they depend on arbitrary HTTP endpoints that may be sporadically unavailable.
This method of importing binaries isn't idiomatic, which doesn't play nicely with other ecosystem components:
- Vulnerabilities in the plugin binary may slip past vuln scanning, since the binary is imported at runtime.
- The plugin version is effectively specified in an arbitrary string. This makes it harder for automation (and engineers operating according to standard Helm/Docker practices) to identify that the version needs to be upgraded.

File-based Plugins

file plugin locations are difficult to implement for users, since they are responsible for placing these plugin binaries on the argo-rollouts filesystem.
- Idiomatically, the user would maintain a Docker build pipeline that uses the open-source argo-rollouts image as a base, and adds the requisite plugins to the file-system, which.
- Note: This will get somewhat easier if/when image volumes are GA-ed (VolumeSource: OCI Artifact and/or Image kubernetes/enhancements#4639), since plugins can be published as Docker images.

Proposal

argo-rollouts should slowly migrate from launching go-plugins as processes within the argo-rollouts container, to separate sidecar processes managed by Kubernetes. This has a number of benefits:

Plugins can be distributed as standalone Docker images. This allows for idiomatic versioning and caching.
Separate Plugin sidecars make the system easier-to-understand and debug for users:
- Users can clearly identify which plugins are failing using standard Kubernetes tooling (i.e. viewing container exits, logs, etc).
- Users can identify what is/isn't being effectively monitored.
  - For example, some plugins may have a separate monitoring endpoint, which users can see a PodMonitor for.
  - Alternatively, some plugins may lack quality monitoring, which would be clear to users.
- Similarly, because ports can be exposed directly via the K8s APIs, users can easily identify debug/pprof endpoints.

This design could be implemented using Helm templates. Effectively, a user would import a plugin chart, which foundationally exports a library template. This template could then be invoked to add the sidecar configuration.

Alternatives Considered

The alternative is likely a combination of several fixes to each point problem:

1. Lack of Observability/Prometheus Metrics

Prometheus

Alternatively, go-plugin or argo-rollouts could be extended to provide a metrics interface between the main process and plugin components.

`pprof`

Each plugin could be expected to implement its own pprof endpoint.
The central argo-rollouts pprof endpoint could then proxy into the plugin endpoints.

2. Runtime download of HTTP Artifacts

[Option 1]: Mega Docker Image

The argo-rollouts project could have a central pipeline that exports a "mega" Docker image containing all "well-known" plugins for users to utilize. This would mitigate the risks of using HTTP directly.

[Option 2]: Image Volumes
As mentioned previously, once GA, Argo Rollouts could move to using Image Volumes as the default model for plugin distribution.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

DerekTBrown · 2025-01-13T16:24:05Z

Alternatively, I think it makes sense to discuss abandoning go-plugin in favor of internalizing the plugins within the codebase. There are only a handful of maintained plugins, and it seems like there are significant benefits to having a singular codebase.

DerekTBrown added the enhancement New feature or request label Jan 10, 2025

DerekTBrown mentioned this issue Jan 13, 2025

[feat] ability to customize kube client qps argoproj-labs/rollouts-plugin-trafficrouter-gatewayapi#101

Merged

kostis-codefresh assigned zachaller Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Move to Sidecar-based Plugins #4042

Proposal: Move to Sidecar-based Plugins #4042

DerekTBrown commented Jan 10, 2025

DerekTBrown commented Jan 13, 2025

Proposal: Move to Sidecar-based Plugins #4042

Proposal: Move to Sidecar-based Plugins #4042

Comments

DerekTBrown commented Jan 10, 2025

Problem Statement

1. Lack of Observability/Prometheus Metrics

2. Runtime download of HTTP Artifacts or forking the Docker image

HTTP-based Plugins

File-based Plugins

Proposal

Alternatives Considered

1. Lack of Observability/Prometheus Metrics

Prometheus

pprof

2. Runtime download of HTTP Artifacts

DerekTBrown commented Jan 13, 2025

`pprof`