Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional ca-bundle target option: pvc #381

Closed
lknite opened this issue Jul 9, 2024 · 11 comments
Closed

additional ca-bundle target option: pvc #381

lknite opened this issue Jul 9, 2024 · 11 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@lknite
Copy link

lknite commented Jul 9, 2024

I have encountered the situation where the size of a ca-bundle exceeds the allowed size of a configmap and/or secret.

Options:

  • the allowed size of a configmap and/or secret can be increased, though this will increase what's allowed cluster-wide
  • use a pvc instead, which has no size limit defined, instead the size limit is handled through other means, often a csi provisioner

Reference:

@erikgb
Copy link
Contributor

erikgb commented Jul 15, 2024

This use case is probably better supported by trust-manager csi-driver, @ThatsMrTalbot? 😉

@ThatsMrTalbot
Copy link
Contributor

ThatsMrTalbot commented Jul 15, 2024

In its current implementation the POC CSI driver loads the bundle from the secret/configmap so would have the same issue.

However I would not want to implement syncing to a PVC as targeting a PVC brings in more complications, for example:

  • ReadWriteOnce would require is to create a volume for every Pod, creating/syncing a volume on pod creation would block pod startup.
  • Not all CSI drivers implement ReadWriteMany.
  • Some CSI drivers create volumes that have zonal restrictions, meaning we would need to run trust manager in every zone.

Other implementation options:

  • CSI driver (as mentioned above)
  • Init container that writes the bundle to an emptyDir

@lknite
Copy link
Author

lknite commented Jul 16, 2024

  • emptydir's are not always allowed cause they use space on a worknode hard drive, but generally pvcs are
  • i don't know about zonal restrictions, but would you say that's so common as to really be an issue?
  • folks choosing the pvc option are doing so because configmap and secret will not work due to size restrictions, so waiting a bit when needed may be a completely acceptable choice

@ThatsMrTalbot
Copy link
Contributor

I have put a bunch of thought into the implementation details, and I really don't think writing to PVCs is a feasible option.

If the CSI only supports ReadWriteOnce (EBS for example):

  • ReadWriteOnce cannot be attached to multiple Pods
  • Since multiple Pods cannot mount the same volume, a PVC must be created per Pod.
  • The moment we create the PVC, it will be attached to the target Pod meaning the trust-manager pod cannot mount it to write the bundle.
  • We would have to create a different PVC, create a VolumeSnapshot then create the Pods PVC using that VolumeSnapshot
  • VolumeSnapshots are not supported by every CSI provider.
  • Even with all this, the Pod startup delay would be massive.

If the CSI supports ReadWriteMany (EFS for example):

  • We can create a single PVC and mount it to multiple Pods.
  • If the CSI mounts can only only be mounted in a single AZ, we would need to create a PVC per zone and run a trust-manager per zone in order to write the bundle to the PVC.

Using PVCs creates a massive dependency on a specific feature set that CSI must implement.


In regards to some of your comments:

emptydir's are not always allowed cause they use space on a worknode hard drive, but generally pvcs are

EmptyDir can be used to create a in-memory FS with a size limit, not writing anything to hosts disk. The limit would not even need to be that big, 5mb can hold a lot of certificates.

i don't know about zonal restrictions, but would you say that's so common as to really be an issue?

The AWS EBS CSI driver has zonal restrictions. The AWS EFS driver may have zonal limits, it depends on configuration.


Honestly I think the perfect solution is writing our own CSI driver, this would have the following benefits:

  • No startup delay, the driver we write would handle the bundle aggregation and write it to a tmpfs mount.
  • No size limit since the driver is doing the aggregation.
  • No dependency on another CSI provider with a specific feature set to create PVCs

@arsenalzp
Copy link
Contributor

Do you already have project for CSI driver for this purpose? I can try to contribute.

@erikgb
Copy link
Contributor

erikgb commented Jul 21, 2024

Do you already have project for CSI driver for this purpose? I can try to contribute.

No, not yet. @ThatsMrTalbot has create a POC for it, but not yet anything official. :-)

@ThatsMrTalbot
Copy link
Contributor

After a demo at the last community meeting the POC was moved into the cert-manager org:
https://github.com/cert-manager/trust-manager-csi-driver

Currently it loads the secret/configmap that trust-manager created, but that can be changed so it can perform the aggregation itself and thus remove the 1mb limit.

This is still very much in the early stages, but if you want to contribute to the design/build then feel free to get involved, a good place to start is our daily stand-ups or bi-weekly community meetings. See https://cert-manager.io/docs/contributing.

@cert-manager-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@cert-manager-prow cert-manager-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2024
@cert-manager-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale

@cert-manager-prow cert-manager-prow bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 6, 2024
@cert-manager-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close

@cert-manager-prow
Copy link
Contributor

@cert-manager-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants