Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI inline volume support KEP with suggested updates #790

Merged
merged 1 commit into from
Feb 26, 2019
Merged

CSI inline volume support KEP with suggested updates #790

merged 1 commit into from
Feb 26, 2019

Conversation

vladimirvivien
Copy link
Member

@vladimirvivien vladimirvivien commented Jan 30, 2019

This is to add changes suggested by @liggitt after KEP merged.
These are mostly cosmetic with design updates that still keep the KEP status implementable.

Original KEP - #716

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 30, 2019
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/pm sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 30, 2019
@vladimirvivien
Copy link
Member Author

/sig storage

@vladimirvivien
Copy link
Member Author

/assign @saad-ali

PTAL

@vladimirvivien
Copy link
Member Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2019
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 31, 2019
@vladimirvivien
Copy link
Member Author

/rm hold

@vladimirvivien
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2019
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 1, 2019
@vladimirvivien
Copy link
Member Author

@kfox1111 @saad-ali PTAL

keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
* The Kubelet and external CSI components must ensure secret references can only be used with the namespace from inline pod spec.
* Persistent inline volumes will not support secrets during attach/detach operations (see above for detail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should limit the secret fields to ones we intend to support (drop ControllerPublishSecretRef) in the inline CSIVolumeSource typedef above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Agreed!

@liggitt
Copy link
Member

liggitt commented Feb 1, 2019

this flow makes a lot more sense to me, thanks for the update

Copy link

@kfox1111 kfox1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@liggitt
Copy link
Member

liggitt commented Feb 5, 2019

The external CSI A/D controller copies whole VolumeSource from Pod into VolumeAttachment. This allows external CSI attacher to detach volumes for deleted pods without keeping any internal database of attached VolumeSources.

It's worth calling out that when multiple pods with the same node + driver + volumeHandle exist, only the volumeSource of the first pod encountered by the attacher controller is effective in specifying parameters used by the attacher. Since that can result in extremely hard-to-understand behavior, it's important to make sure the approvers understand/agree with that outcome.

* The driver can support PV/PVC originated volume (normal operation).
* If the volume originates from a pod spec, it is assumed that its lifecycle is `persistent`.
## Ephemeral inline volume proposal
A CSI driver may be able to support either PV/PVC-originated or pod spec originated volumes. When a volume definition is embedded inside a pod spec, it is considered to be an `ephemeral inline` volume request and can only participate in *mount/unmount* volume operation calls. Ephemeral inline volume requests have the following characteristics:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we added a flag in CSIDriver CRD to ensure that the driver was capable of operating properly as an ephemeral driver. I think this is still important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the flag be used to gate?

Copy link
Member

@liggitt liggitt Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would prevent calling mount/unmount on a CSI driver for an inline ephemeral volume source, if the driver did not support ephemeral volumes

otherwise we're assuming a persistent volume CSI driver will fail in a safe/performant/reliable way when asked to mount a bogus (generated) volumeHandle for an inline ephemeral volumesource, which doesn't seem great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to summarize, it would be used to gate the auto-generation of a volume handle during mount/unmount. And this is to protect persistent drivers, not to protect ephemeral drivers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is to protect persistent drivers, not to protect ephemeral drivers.

protection would go both ways, I'd imagine (ephemeral drivers wouldn't be eligible for being called to satisfy storageclass or PV-related things)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 OK. The flag will be a string where the current supported value is "ephemeral". Future implementation may define a new value. A value of "persistent" does not make sense right now since the code will not support that feature.

I will update the KEP again to reflect that and the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still do want persistent right? Persistent supports provision, attach, resize, mount, etc, but does not support volume handle generation. Ephemeral only supports mount/unmount and volume handle generation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42
It sounded like here (kubernetes/kubernetes#74086 (comment)) that persistent inline CSI would not be supported at all. Maybe I am missing something. Let me know what you think. If you think we will come back (beta) to support persistent inline CSI, then I will word the KEP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Persistent inline is not supported. However, it may still be useful to distinguish between persistent and ephemeral drivers in general due to their different lifecycles

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 I see.
OK. I see your point. I will rename the flag to reflect that distinction CSIDriver.Mode = {ephemeral|persistent}. Thanks for clarifying.

@kfox1111
Copy link

Just the one issue I think.

* The driver can support PV/PVC originated volume (normal operation).
* If the volume originates from a pod spec, it is assumed that its lifecycle is `persistent`.
## Ephemeral inline volume proposal
A CSI driver may be able to support either PV/PVC-originated or pod spec originated volumes. When a volume definition is embedded inside a pod spec, it is considered to be an `ephemeral inline` volume request and can only participate in *mount/unmount* volume operation calls. Ephemeral inline volume requests have the following characteristics:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the flag be used to gate?

keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
keps/sig-storage/20190122-csi-inline-volumes.md Outdated Show resolved Hide resolved
* Two pods accessing the same inline persistent volumes
* CSI attacher code invokes driver operation during attach for persistent volumes
* CSI detacher code invokes driver operation during detach for persistent volumes
* Two pods accessing the same ephemeral inline volumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what "the same ephemeral inline volumes" means... the volumes are per pod

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say you create a deployment with:

volumes:
- name: foo
  csi:
    driver: myephemeraldriver

Two pods could land on the same node. If pod A writes to the foo volume, and pod B reads from it, it really should be a different volume between the two, even though they are defined the same in the pod. We should check to ensure that is the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, verifying an identical inline definition results in distinct volume instances. sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a single pod with two inline ephemeral volumes with identical definitions (other than name) might be easier to test (guaranteed to be on the same node) and test uniqueness down to the subpod level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both test cases are important to have. One validates that the volumehandle is unique per volume name. And the other validates that the volumehandle is unique per pod.

You can use pod affinity to make sure two pods are on the same node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, both tests are important. While writing the KEP we found problems in the KEP for each case uniquely. The KEP has it fixed now, but should have tests. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update language to indicate what's being tested

  1. verifying an identical inline definition results in distinct volume instances
  2. a single pod with two inline ephemeral volumes with identical definitions (other than name)

@vladimirvivien
Copy link
Member Author

@msau42 @liggitt @kfox1111 PTAL

@vladimirvivien
Copy link
Member Author

@jsafrane PTAL

@msau42
Copy link
Member

msau42 commented Feb 22, 2019

lgtm. Will let @kfox1111 and @liggitt review

@msau42 msau42 mentioned this pull request Feb 22, 2019
@vladimirvivien
Copy link
Member Author

@liggitt @saad-ali PTAL, Looking for a lgtm/approval so this can be merged.

@vladimirvivien
Copy link
Member Author

@msau42 @kfox1111 PTAL

@@ -151,12 +151,20 @@ type CSIVolumeSource struct {
```

### Driver inline mode
To indicate that a driver is setup to handle ephemeral inline volume requests, the existing `CSIDriver` object will be updated to include boolean attribute `InlineMode`. When `CSIDriver.InlineMode == true` the followings are assumed:
To indicate that the driver will support ephemeral inline volume requests, the existing `CSIDriver` object will be extended to include attribute `InlineMode`. Currently the only InlineMode that will be supported is the `ephemeral` mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think we should go back to the field from the original proposal that had both "ephemeral" and "persistent". It makes it easier to add runtime validation when we make all the CSI calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree. I wasn't anticipating this much of a rework in response to Tim's suggestion to defer persistent inline volumes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's we'll do - #790 (comment)

The rework is more design then code. The code will be simpler once we agree on the KEP.

@vladimirvivien
Copy link
Member Author

Thanks for all of the reviews.

creation-date: 2019-01-22
status: implementable
---

# Inline CSI volumes
# Ephemeral inline CSI volumes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Inline ?

@vladimirvivien
Copy link
Member Author

@msau42 @liggitt @kfox1111 PTAL

* The driver will not receive separate gRPC calls for provisioning, attaching, detaching, and deleting of volumes.
* The driver is responsible for implementing steps to ensure the volume is created and made avaiable to the pod during mount call
* The driver is responsible for implementing steps to delete and clean up any volume and resources during the unmount call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe two more while we're at it:

  • The mount could happen multiple times for the same volumeHandle. The first should allocate all backing volume data.
  • The unmount could happen multiple times for the same volumeHandle. The first should remove any backing volume data. It will be called at a minimum one time. No volume accesses will be made after the first call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfox1111 I will highlight your points.

* Ensure volumeHandle conforms to resource naming format
* CSIVolumeSource info persists in CSI json file during mount/unmount
* Ensure Kubelet skips attach/detach when `CSIDriver.InlineMode = true`
* Ensure Kubelet skips inline logic when `CSIDriver.InlineMode = false`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a couple of references to InlineMode here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks.


When `CSIDriver.Mode == <not specified>` or when `CSIDriver.Mode == persistent`, the driver will function as normal supporting only PV/PVC-requested volumes and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Mode be defaulted to persistent?

Copy link
Member Author

@vladimirvivien vladimirvivien Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode defaults to <unspecified> so we don't force reconfiguration of existing deployed drivers. So unspecified === persistent.

* Kubelet will auto-generate a volumeHandle based on `podUID` and `pod.spec.volume[x].name` (see above for detail).
* CSI driver will receive mount-like calls (NodePublish) with generated paths and generated volumeHandle.

Since ephemeral volume requests will participate in only the mount/unmount volume operation phase, CSI drivers are responsible for implementing all necessary operations during that phase (i.e. create, mount, unmount, delete, etc). For instance, a driver would be responsible for provisioning any new volume resource during the `NodePublishStage` and for tearing down these resources during the `NodeUnpublish` call.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NodePublish stage. To not confuse it with NodeStage

@msau42
Copy link
Member

msau42 commented Feb 25, 2019

lgtm, just some minor comments.

@msau42
Copy link
Member

msau42 commented Feb 26, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 26, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, msau42, saad-ali, vladimirvivien

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit fc820ae into kubernetes:master Feb 26, 2019
@kfox1111
Copy link

Great work everyone! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants