Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deploying central Vault Agent HTTP Caching Proxy #756

Open
Freyert opened this issue Jul 18, 2022 · 2 comments
Open

Support deploying central Vault Agent HTTP Caching Proxy #756

Freyert opened this issue Jul 18, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@Freyert
Copy link

Freyert commented Jul 18, 2022

Is your feature request related to a problem? Please describe.

Many Dynamic Secrets engines can not support a high number of credential requests from replicated workloads. For example, if the Atlas Secrets Engine needed to provision 100 database credentials for 100 pods, this would likely lock any other vital automation in the Atlas environment such as backups or scaling.

The solution to this issue is to run a Vault Agent as a Caching Proxy for credential requests. If all pods use a single k8s service account via the Vault Caching Proxy then the Vault Server only provisions a single instance of the dynamic credential for all 100 pods. The credentials are now "service account scoped" instead of "pod scoped".

Describe the solution you'd like

Preferably, the helm chart would support a k8s Deployment that pushes out a cluster (replicated or not) of Vault Agent proxies behind a k8s service.

Currently #749 attempts to add the Vault Agent Proxy as a side care for the CSI storage engine. This provides no benefit for the Vault Injector. A standalone proxy would help both and give operators the control they need to confidently administrate Vault workflows.

Describe alternatives you've considered

  • I've looked at a lot of "operators" that make K8S secrets from Vault, but that introduces a lot of moving parts and we lose the air gapped environment Vault is aiming to provide.

Additional Context

Vault Agent Injector

  • Agent Metrics Configuration vault-k8s#331
    • Wants to be able to configure telemetry on each agent in a pod. Leads to a bunch of low value time series for Prometheus, etc. A central proxy would be easier to configure and provide higher value time series.
  • What happens if Vault goes down? vault-k8s#49
    • Vault Agent proxy provides an extra redundancy that can be used on top of a HA Vault.
    • HA Vault is great, but is still vulnerable to misconfigurations.

Secrets CSI Provider

Other Technical Advantages

In general, I think there are strong reasons to treat the Vault Agent Proxy as a standalone deployment:

  1. HA/DR
    • Deploy multiple instances of a cache with topology aware scheduling to be resilient against zonal failures.
    • Simpler run books: scale up, restart, for an individual component instead of a coupled component.
  2. Monitoring
    • Monitoring all Injected Agents for the Vault Injector may be untenable for overloaded prometheus instances.
    • A central cache establishes a good "bottle neck" to monitor the aggregate and then identify the issue.
  3. Improve Cache Hit Rates
    • In large clusters it may be valuable to partition Vault Proxies by application to have smaller deployments with higher cache hit rates.
  4. More Generic -> More Use Cases
    • Building the Vault Agent proxy into the injector or the CSI is a good idea, but a standalone instance can support more use cases.
    • More use cases means more improvements delivered to a smaller set of files in the code base.
@Freyert Freyert added the enhancement New feature or request label Jul 18, 2022
@Freyert
Copy link
Author

Freyert commented Jul 20, 2022

I was just checking to see if I had missed something, but the StatefulSet does indeed force you to use the vault server command.

New work would be needed to allow deploying Vault Agent. Would also probably be better as a Deployment instead of a StatefulSet.

@tomhjp
Copy link
Contributor

tomhjp commented Apr 5, 2023

The credentials are now "service account scoped" instead of "pod scoped".

Just to note on this point, to get a cache hit on Agent currently, the token used for logging in has to be the exact same token. But in modern k8s versions every pod gets its own projected service account token with a different TTL/pod owner etc. So to get cache hits from different pods, we'd either have to engineer every pod using the same token (probably not tenable), or implement a feature in Agent that allows a cache hit based on some local token validation and service account matching, or some other similar feature that relaxes the requirements for a cache hit without risking impersonation by attackers.

That's not to say it's not possible, but it's a bit more work than it looks like upfront.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants