-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI inline volume support KEP with suggested updates #790
CSI inline volume support KEP with suggested updates #790
Conversation
/sig storage |
/assign @saad-ali PTAL |
/hold |
/rm hold |
/hold cancel |
* The Kubelet and external CSI components must ensure secret references can only be used with the namespace from inline pod spec. | ||
* Persistent inline volumes will not support secrets during attach/detach operations (see above for detail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should limit the secret fields to ones we intend to support (drop ControllerPublishSecretRef) in the inline CSIVolumeSource typedef above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Agreed!
this flow makes a lot more sense to me, thanks for the update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
It's worth calling out that when multiple pods with the same node + driver + volumeHandle exist, only the volumeSource of the first pod encountered by the attacher controller is effective in specifying parameters used by the attacher. Since that can result in extremely hard-to-understand behavior, it's important to make sure the approvers understand/agree with that outcome. |
* The driver can support PV/PVC originated volume (normal operation). | ||
* If the volume originates from a pod spec, it is assumed that its lifecycle is `persistent`. | ||
## Ephemeral inline volume proposal | ||
A CSI driver may be able to support either PV/PVC-originated or pod spec originated volumes. When a volume definition is embedded inside a pod spec, it is considered to be an `ephemeral inline` volume request and can only participate in *mount/unmount* volume operation calls. Ephemeral inline volume requests have the following characteristics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we added a flag in CSIDriver CRD to ensure that the driver was capable of operating properly as an ephemeral driver. I think this is still important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the flag be used to gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would prevent calling mount/unmount on a CSI driver for an inline ephemeral volume source, if the driver did not support ephemeral volumes
otherwise we're assuming a persistent volume CSI driver will fail in a safe/performant/reliable way when asked to mount a bogus (generated) volumeHandle for an inline ephemeral volumesource, which doesn't seem great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to summarize, it would be used to gate the auto-generation of a volume handle during mount/unmount. And this is to protect persistent drivers, not to protect ephemeral drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is to protect persistent drivers, not to protect ephemeral drivers.
protection would go both ways, I'd imagine (ephemeral drivers wouldn't be eligible for being called to satisfy storageclass or PV-related things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 OK. The flag will be a string where the current supported value is "ephemeral". Future implementation may define a new value. A value of "persistent" does not make sense right now since the code will not support that feature.
I will update the KEP again to reflect that and the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still do want persistent right? Persistent supports provision, attach, resize, mount, etc, but does not support volume handle generation. Ephemeral only supports mount/unmount and volume handle generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42
It sounded like here (kubernetes/kubernetes#74086 (comment)) that persistent inline CSI would not be supported at all. Maybe I am missing something. Let me know what you think. If you think we will come back (beta) to support persistent inline CSI, then I will word the KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persistent inline is not supported. However, it may still be useful to distinguish between persistent and ephemeral drivers in general due to their different lifecycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 I see.
OK. I see your point. I will rename the flag to reflect that distinction CSIDriver.Mode = {ephemeral|persistent}
. Thanks for clarifying.
Just the one issue I think. |
* The driver can support PV/PVC originated volume (normal operation). | ||
* If the volume originates from a pod spec, it is assumed that its lifecycle is `persistent`. | ||
## Ephemeral inline volume proposal | ||
A CSI driver may be able to support either PV/PVC-originated or pod spec originated volumes. When a volume definition is embedded inside a pod spec, it is considered to be an `ephemeral inline` volume request and can only participate in *mount/unmount* volume operation calls. Ephemeral inline volume requests have the following characteristics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the flag be used to gate?
* Two pods accessing the same inline persistent volumes | ||
* CSI attacher code invokes driver operation during attach for persistent volumes | ||
* CSI detacher code invokes driver operation during detach for persistent volumes | ||
* Two pods accessing the same ephemeral inline volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what "the same ephemeral inline volumes" means... the volumes are per pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say you create a deployment with:
volumes:
- name: foo
csi:
driver: myephemeraldriver
Two pods could land on the same node. If pod A writes to the foo volume, and pod B reads from it, it really should be a different volume between the two, even though they are defined the same in the pod. We should check to ensure that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, verifying an identical inline definition results in distinct volume instances. sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a single pod with two inline ephemeral volumes with identical definitions (other than name) might be easier to test (guaranteed to be on the same node) and test uniqueness down to the subpod level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both test cases are important to have. One validates that the volumehandle is unique per volume name. And the other validates that the volumehandle is unique per pod.
You can use pod affinity to make sure two pods are on the same node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, both tests are important. While writing the KEP we found problems in the KEP for each case uniquely. The KEP has it fixed now, but should have tests. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update language to indicate what's being tested
- verifying an identical inline definition results in distinct volume instances
- a single pod with two inline ephemeral volumes with identical definitions (other than name)
@jsafrane PTAL |
@@ -151,12 +151,20 @@ type CSIVolumeSource struct { | |||
``` | |||
|
|||
### Driver inline mode | |||
To indicate that a driver is setup to handle ephemeral inline volume requests, the existing `CSIDriver` object will be updated to include boolean attribute `InlineMode`. When `CSIDriver.InlineMode == true` the followings are assumed: | |||
To indicate that the driver will support ephemeral inline volume requests, the existing `CSIDriver` object will be extended to include attribute `InlineMode`. Currently the only InlineMode that will be supported is the `ephemeral` mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I think we should go back to the field from the original proposal that had both "ephemeral" and "persistent". It makes it easier to add runtime validation when we make all the CSI calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree. I wasn't anticipating this much of a rework in response to Tim's suggestion to defer persistent inline volumes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's we'll do - #790 (comment)
The rework is more design then code. The code will be simpler once we agree on the KEP.
Thanks for all of the reviews. |
creation-date: 2019-01-22 | ||
status: implementable | ||
--- | ||
|
||
# Inline CSI volumes | ||
# Ephemeral inline CSI volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Inline ?
* The driver will not receive separate gRPC calls for provisioning, attaching, detaching, and deleting of volumes. | ||
* The driver is responsible for implementing steps to ensure the volume is created and made avaiable to the pod during mount call | ||
* The driver is responsible for implementing steps to delete and clean up any volume and resources during the unmount call. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe two more while we're at it:
- The mount could happen multiple times for the same volumeHandle. The first should allocate all backing volume data.
- The unmount could happen multiple times for the same volumeHandle. The first should remove any backing volume data. It will be called at a minimum one time. No volume accesses will be made after the first call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kfox1111 I will highlight your points.
* Ensure volumeHandle conforms to resource naming format | ||
* CSIVolumeSource info persists in CSI json file during mount/unmount | ||
* Ensure Kubelet skips attach/detach when `CSIDriver.InlineMode = true` | ||
* Ensure Kubelet skips inline logic when `CSIDriver.InlineMode = false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a couple of references to InlineMode here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks.
|
||
When `CSIDriver.Mode == <not specified>` or when `CSIDriver.Mode == persistent`, the driver will function as normal supporting only PV/PVC-requested volumes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Mode be defaulted to persistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mode
defaults to <unspecified>
so we don't force reconfiguration of existing deployed drivers. So unspecified
=== persistent
.
* Kubelet will auto-generate a volumeHandle based on `podUID` and `pod.spec.volume[x].name` (see above for detail). | ||
* CSI driver will receive mount-like calls (NodePublish) with generated paths and generated volumeHandle. | ||
|
||
Since ephemeral volume requests will participate in only the mount/unmount volume operation phase, CSI drivers are responsible for implementing all necessary operations during that phase (i.e. create, mount, unmount, delete, etc). For instance, a driver would be responsible for provisioning any new volume resource during the `NodePublishStage` and for tearing down these resources during the `NodeUnpublish` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NodePublish
stage. To not confuse it with NodeStage
lgtm, just some minor comments. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, msau42, saad-ali, vladimirvivien The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Great work everyone! :) |
This is to add changes suggested by @liggitt after KEP merged.
These are mostly cosmetic with design updates that still keep the KEP status implementable.
Original KEP - #716