-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-596: promote CSIInlineVolume to GA #3158
Conversation
18e4aad
to
f3e11c8
Compare
97fe988
to
d3aa8d0
Compare
#### GA | ||
|
||
- Remove dependency on deprecated `PodSecurityPolicy` and document new strategy | ||
- Upgrade / downgrade manual testing | ||
- Conformance tests implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to leave the feature in beta and publish updated docs that don't rely on using PodSecurityPolicy, then get feedback for at least 1 release to make sure that folks are happy with how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a line to the GA graduation criteria to make sure the docs get updated before graduation. I'm not so sure it's necessary to wait another release cycle after the docs are updated since this feature has already been in beta for >2 years, but I'll defer that decision to the sig-storage leads.
A question: if I'm a cluster admin and volume creation is relatively expensive, but my CSI driver makes it possible, what controls can / should I put in place to stop a single namespace from having a wider impact on my storage? Are these controls different from the restrictions I would need to put in place to stop people using the PersistentVolumeClaim mechanism to achieve the same kind of denial-of-service / spend-all-the-budget impact? My point really is that if this does introduce a new way to turn namespace-scoped access into a shocking cloud bill, we should take that into account. |
This has been discussed on sig-auth meeting several times, the guidance we got is that you should not install "unsafe" CSI drivers in the first place. Every CSI driver that allows "Ephemeral" use is a suspicious one and either it should be safe by default or come with something that makes it safe, such as an admission plugin. We solve this by documentation and by fixing CSI drivers that are under kubernetes-sigs umbrella. There is nothing in Kubernetes that would prevent an unsafe (or broken) driver to do anything with your storage. |
/assign @wojtek-t |
Thanks, that makes sense. |
We will update the documentation to include the security aspects of inline CSI volumes and recommend CSI driver vendors not implement inline volumes for persistent storage unless they also provide a 3rd party pod admission plugin. | ||
|
||
This is consistent with the proposal by sig-auth in [KEP-2579](https://github.com/kubernetes/enhancements/blob/787515fbfa386bed95ff4d21e472474f61d1c536/keps/sig-auth/2579-psp-replacement/README.md?plain=1#L512-L519) regarding how inline CSI volumes should be handled. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Secret Store CSI Driver handle the security aspects properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of.
- The CSI driver needs an user to set up
SecretProviderClass
CR to provide any secrets to Pods. This is well documented: https://secrets-store-csi-driver.sigs.k8s.io/concepts.html#secretproviderclass - What is not documented is that cluster admin should give permissions to create (update, delete, read)
SecretProviderClasses
only to trustworthy users. Depending on the secret provider underneath and its permissions, an users that can create SecretProviderClasses can get any secrets that underlying secret provider has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bunch of PRR-related questions, but I would also like to see SIG-level approval first.
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes, the unit tests will test with the feature gate enabled and disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To you have any tests that exercise the "switch" (i.e. the test start with feature gate enabled and disables it during the test - or vice-versa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for this question in the template states:
The e2e framework does not currently support enabling or disabling feature
gates. However, unit tests in each component dealing with managing data, created
with and without the feature, are necessary. At the very least, think about
conversion tests if API types are being modified.
I see quite a few unit tests exercising objects "created with and without the feature". For example the strategy and mounter tests. Is this what you're looking for? Or if not, is there an example you can point me to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a test that is doing something, switching the feature-gate and then doing something else and checking
This one is a nice (and simple) example:
https://github.com/kubernetes/kubernetes/blob/b7c82bb83c1b3933b99fbc5fdcffa59fd6441617/pkg/registry/networking/networkpolicy/strategy_test.go#L246-L281
[I'm not going to block this GA on this at this point, just checking where we are and building awareness that things are at least to some extent possible. FWIW, I noted an AI for myself to extend the comment for that question in the template]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks for the example. I added a TODO item in the "Ephemeral inline volumes unit tests" section to follow up on this. It should be possible to add something similar for this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
also needs to support ephemeral inline volumes, and the pod spec would need | ||
to be updated to make use of the feature. | ||
|
||
A rollback (or disabling the feature flag) may impact running workloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph seems to be focused on "new workloads we're trying start".
What about the pods that use CSI inline volumes that are already running. Is there anything that could lead to their failures? Under normal circumstances? If they fails for some reason and needs to be restarted by Kubelet (e.g. due to Always restart policy)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded the rollout section to (hopefully) clarify. I can't think of any reason why rollout would impact running pods, even in the case of pod restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - the previous paragraph now looks ok to me.
Regarding this one ("A rollback...") I would make it more explicit, that once these workloads are already running, they are fine. And it may be problematic if we want to start new ones or potentially we need to restart the running ones (if kubelet no longer has a FG enabled, then it will not call appropriate mount/unmount calls, right?
So (assuming I'm correct - please challenge me if I'm missing something), I would change to something like:
A rollback (or disabling the feature flag) will not impact already running workloads, however starting new ones (as well as potentially restarting ones that failed for some reason) would fail. Additionally, kubelet will fail to fully cleanup after pods that were running and taking advantage of inline volumes.
*It is highly recommended to delete any pods using this feature before disabling it*.
Does that make sense? Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, with one caveat. If the feature flag is disabled, kubelet will still attempt to unmount existing volumes when requested to do so (i.e. there is no feature flag check when unmounting):
https://github.com/kubernetes/kubernetes/blob/66445662ad7b65ffdb144b9665608f7b6a4b57a6/pkg/volume/csi/csi_mounter.go#L380-L382
But it will not attempt to mount new volumes using this feature if it's disabled.
https://github.com/kubernetes/kubernetes/blob/66445662ad7b65ffdb144b9665608f7b6a4b57a6/pkg/volume/csi/csi_mounter.go#L380-L382
I changed your wording slightly to "kubelet may fail to fully cleanup" since kubelet will request NodeUnpublishVolume
but cleanup then depends on the individual CSI driver, and we still recommend deleting pods using this feature before disabling it.
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
||
Compared to CSI persistent volumes, there should be no increase in the amount of time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's worth clarifying that mount time is on a critical part for pod-startup slo:
https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
And configmaps/secrets are actually (at least as of now) part of stateless pods, which means that we may see a regression.
So following up on that (wearing my scalability hat now) - do we have any numbers about this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's worth clarifying that mount time is on a critical part for pod-startup slo:
I've added a note to this in the KEP with a link to the pod startup latency SLO.
And configmaps/secrets are actually (at least as of now) part of stateless pods, which means that we may see a regression.
Ah, but the definition in the pod startup latency SLO states:
[2]A stateless pod is defined as a pod that doesn't mount volumes with sources other than secrets, config maps, downward API and empty dir.
If a pod uses CSI inline volumes, that is still a type of volume. So it's not really a stateless pod in that case.
So following up on that (wearing my scalability hat now) - do we have any numbers about this now?
No, I don't have any numbers readily available. What kind of comparison would be interesting to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod uses CSI inline volumes, that is still a type of volume. So it's not really a stateless pod in that case.
OK - you're right. Thanks for correcting me.
No, I don't have any numbers readily available. What kind of comparison would be interesting to see?
So what I'm looking for is a comparison, that e.g. if I have a pod with 5 secrets, it takes X seconds now, but it takes Y second if we mount those secrets via csi inline volume.
[Again - non-blocking, as it's not a replacement, it's a new way of doing stuff. I'm asking more for building awareness about that.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I'm looking for is a comparison, that e.g. if I have a pod with 5 secrets, it takes X seconds now, but it takes Y second if we mount those secrets via csi inline volume.
Ok, I'd expect this to vary based on the CSI driver used, but I agree it would be useful to document this comparison so we at least have some idea of how big that difference is for a basic use case like this to set expectations for users of this feature. I've added a TODO line in this section to follow up on it during GA testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/assign @xing-yang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few more nits - but I'm overall fine with this PRR.
However, I will wait with approval for this to be approved by the SIG leads.
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes, the unit tests will test with the feature gate enabled and disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a test that is doing something, switching the feature-gate and then doing something else and checking
This one is a nice (and simple) example:
https://github.com/kubernetes/kubernetes/blob/b7c82bb83c1b3933b99fbc5fdcffa59fd6441617/pkg/registry/networking/networkpolicy/strategy_test.go#L246-L281
[I'm not going to block this GA on this at this point, just checking where we are and building awareness that things are at least to some extent possible. FWIW, I noted an AI for myself to extend the comment for that question in the template]
also needs to support ephemeral inline volumes, and the pod spec would need | ||
to be updated to make use of the feature. | ||
|
||
A rollback (or disabling the feature flag) may impact running workloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - the previous paragraph now looks ok to me.
Regarding this one ("A rollback...") I would make it more explicit, that once these workloads are already running, they are fine. And it may be problematic if we want to start new ones or potentially we need to restart the running ones (if kubelet no longer has a FG enabled, then it will not call appropriate mount/unmount calls, right?
So (assuming I'm correct - please challenge me if I'm missing something), I would change to something like:
A rollback (or disabling the feature flag) will not impact already running workloads, however starting new ones (as well as potentially restarting ones that failed for some reason) would fail. Additionally, kubelet will fail to fully cleanup after pods that were running and taking advantage of inline volumes.
*It is highly recommended to delete any pods using this feature before disabling it*.
Does that make sense? Is that correct?
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
||
Compared to CSI persistent volumes, there should be no increase in the amount of time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a pod uses CSI inline volumes, that is still a type of volume. So it's not really a stateless pod in that case.
OK - you're right. Thanks for correcting me.
No, I don't have any numbers readily available. What kind of comparison would be interesting to see?
So what I'm looking for is a comparison, that e.g. if I have a pod with 5 secrets, it takes X seconds now, but it takes Y second if we mount those secrets via csi inline volume.
[Again - non-blocking, as it's not a replacement, it's a new way of doing stuff. I'm asking more for building awareness about that.]
I want to collapse these changes into a single commit before the PR merges, but I'll do that later today, in case anyone is looking at the incremental changes. |
@dobsonj Can you squash your commits? |
/approve |
/lgtm |
/assign @wojtek-t |
- Feature flag enabled by default | ||
- CSI drivers can support both ephemeral inline volumes and persistent volumes | ||
|
||
#### GA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new security guidance, should we also add an item to make sure our sponsored nfs and smb drivers align with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 @xing-yang can you re-lgtm
? That one addition was the only change I made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great now - thanks!
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes, the unit tests will test with the feature gate enabled and disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Additionally, kubelet may fail to fully cleanup after pods that were | ||
running and taking advantage of inline volumes. | ||
|
||
*It is highly recommended to delete any pods using this feature before disabling it*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks!
|
||
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | ||
|
||
Compared to CSI persistent volumes, there should be no increase in the amount of time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dobsonj, wojtek-t, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
KEP-596: promote CSIInlineVolume to GA
KEP Preview: https://github.com/dobsonj/enhancements/tree/kep-596-ga/keps/sig-storage/596-csi-inline-volumes
/cc @jsafrane