-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for CSI Cluster Registry #2514
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
# CSI Cluster Registry Design Doc | ||
|
||
***Status:*** Pending | ||
|
||
***Version:*** Alpha | ||
|
||
***Author:*** Saad Ali ([@saad-ali](https://github.com/saad-ali), [email protected]) | ||
|
||
## Terminology | ||
|
||
Term | Definition | ||
---|--- | ||
Container Storage Interface (CSI) | A specification attempting to establish an industry standard interface that Container Orchestration Systems (COs) can use to expose arbitrary storage systems to their containerized workloads. | ||
CSI Volume Plugin | A new, in-tree volume plugin that acts as an adapter and enables out-of-tree, third-party CSI volume drivers to be used in Kubernetes. | ||
CSI Volume Driver | An out-of-tree CSI compatible implementation of a volume plugin that can be used in Kubernetes through the Kubernetes CSI Volume Plugin. | ||
|
||
# Summary | ||
|
||
This document proposes a "CSI cluster registration mechanism". | ||
|
||
This mechanism will enable: | ||
* A CSI volume driver to register itself with Kubernetes when it is deployed. | ||
* A CSI volume driver to customize how Kubernetes interacts with it (e.g. skip attach process because the driver doesn't support ControllerPublish, etc.). | ||
* A user or cluster admin to easily discover which CSI volume drivers are deployed on their Kubernetes cluster. | ||
* Be optional -- a driver may choose not to use it (and will get the default set of behaviors). | ||
|
||
|
||
## Background & Motivations | ||
|
||
Kubernetes supports the Container Storage Interface (CSI) to enable third party storage developers to deploy volume drivers exposing new storage systems in Kubernetes without having to touch the core Kubernetes code. | ||
Support for CSI was introduced as alpha in Kubernetes v1.9, and moved to beta in v1.10. | ||
See "CSI Volume Plugins in Kubernetes Design Doc" in the "Links" section below for details on how an arbitrary CSI volume driver is deployed and interacts with Kubernetes. | ||
The beta implementation of CSI has a number of limitations, including: | ||
* CSI drivers must be deployed with the provided CSI external-attacher sidecar container, even if they don’t implement ControllerPublishVolume. | ||
* Meaning a CSI volume driver deployed on kubernetes has to deploy a CSI `external-attacher` container even if the volume driver doesn't require a "volume attach" operation. | ||
* In this case, the CSI `external-attacher` container basically does a no-op in response to Kubernetes `VolumeAttachment` objects to allow Kubernetes to continue with the mounting process. | ||
* Users and cluster admins have no easy way to discover what CSI volume drivers are deployed on their Kubernetes cluster. | ||
|
||
The proposed "CSI cluster registration mechanism" should address these issues. | ||
|
||
### Links | ||
|
||
* [Container Storage Interface (CSI) Spec](https://github.com/container-storage-interface/spec/blob/master/spec.md) | ||
* [CSI Volume Plugins in Kubernetes Design Doc](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md) | ||
|
||
## Objective | ||
|
||
The objective of this document is to document all the requirements for enabling a cluster level registry for a CSI compliant volume drivers in Kubernetes. | ||
|
||
## Goals | ||
|
||
* Allow a CSI volume driver (that opts-in to new mechanism) to configure how Kubernetes should interact with it. | ||
* Improve discoverability by users/cluster admins of a CSI volume driver (that opts-in to new mechanism) deployed on the Kubernetes cluster. | ||
|
||
## Non-Goals | ||
|
||
* Define how CSI volume driver should be deployed on Kubernetes. | ||
* Require use of new CSI cluster registry mechanism for all CSI volume drivers deployed on Kubernetes (the mechanism will be optional/opt-in). | ||
|
||
## Design Overview | ||
|
||
A new custom resource definition will automatically be installed on Kubernetes clusters. | ||
|
||
Upon deployment a driver MAY create a new custom resource object. | ||
|
||
## Design Details | ||
|
||
### `CSIDriver` Object | ||
|
||
#### Proposed API | ||
|
||
```go | ||
// CSIDriver captures information about a Container Storage Interface (CSI) | ||
// volume driver deployed on the cluster. | ||
// | ||
// CSIDriver objects are non-namespaced. | ||
type CSIDriver struct { | ||
metav1.TypeMeta `json:",inline"` | ||
|
||
// Standard object metadata. | ||
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata | ||
metav1.ObjectMeta `json:"metadata"` | ||
|
||
// Specification describing the CSI volume driver and any custom | ||
// configuration for it. | ||
Spec CSIDriverSpec `json:"spec"` | ||
} | ||
|
||
// CSIDriverList is a collection of CSIDriver objects. | ||
type CSIDriverList struct { | ||
metav1.TypeMeta `json:",inline"` | ||
|
||
// Standard list metadata | ||
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata | ||
metav1.ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` | ||
|
||
// Items is the list of CSIDrivers | ||
Items []CSIDriver `json:"items"` | ||
} | ||
|
||
// CSIDriverSpec is the specification of a CSIDriver. | ||
type CSIDriverSpec struct { | ||
// Driver indicates the name of the CSI driver that this object refers to. | ||
// This MUST be the same name returned by the CSI GetPluginName() call for | ||
// that driver. | ||
Driver string `json:"driver"` | ||
|
||
// Indicates this CSI volume driver requires an attach operation (because it | ||
// implements the CSI ControllerPublishVolume() method), and that Kubernetes | ||
// should call attach and wait for any attach operation to complete before | ||
// proceeding to mounting. | ||
// If value is not specified, default is true -- meaning attach will be | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a default as in the normal defaults (will be set in the object) or is it the implied behavior? To set a default will require admission control... Just clarify. |
||
// called. | ||
// +optional | ||
AttachRequired *bool `json:"attachRequired"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably a dump question. Isn't this value already exposed from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is where the call is being made. Kubelet is not doing the attach. A pod floating in the cluster takes attachment requests, converts them into the driver calls for attach/detach. You could have the attacher sidecar running, probe the driver as you say, and then determine that it doesn't ever have to do anything. That does work today. But that's a fair amount of extra pluming running in the cluster that isn't needed. The flag above tells Kubernetes to skip entirely any of the handshaking at the cluster level required to do that attach so the sidecar that does nothing and its driver doesn't even need to exist. |
||
|
||
// Indicates this CSI volume driver requires additional pod information | ||
// (like podName, podUID, etc.) during mount operations. | ||
// If this is set to true, Kubelet will pass pod information as | ||
// VolumeAttributes in the CSI NodePublishVolume() calls. | ||
// If value is not specified, default is false -- meaning pod information | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not pass this always? |
||
// will not be passed on mount. | ||
// +optional | ||
PodInfoRequiredOnMount *bool `json:"podInfoRequiredOnMount"` | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EDITED: type VolumeHandleMode string
const (
AutomaticVolumeHandleGeneration VolumeHandleMode = "automaticVolumeHandleGeneration"
NoVolumeHandleGeneration VolumeHandleMode = "noVolumeHandleGeneration"
)
< snip >
type CSIDriverSpec struct {
< snip >
// Indicates how the volumeHandle for inline CSI volumes will be handled during volume operations.
// If set to AutomaticVolumeHandleGeneration, this provides a hint to the Kubelet to
// automatically generate handles for the inline volume.
//
// If set to `NoVolumeHandleGeneration`, this is a hint to Kubelet not to automatically
// generate handles for volumes when not provided.
VolumeHandleMode *VolumeHandleMode `json:"volumeHandleMode"`
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be limited only to inline volumes? Or handles for PVs should be (optionally) autogenerated too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm... pv's too.... say someone used the image driver with a pv (for some reason). It would break horribly if it wasn't given proper handles. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think for now the language and implementation should be limited to Inline. |
||
|
||
``` | ||
|
||
#### CRD Installation | ||
|
||
The `CSIDriver` object schema will checked in to kubernetes/kubernetes repo under a storage directory. | ||
The `CSIDriver` `CustomResourceDefinition` (CRD) will be installed by a new Kubernetes controller that is responsible for ensuring required CRDs are installed. | ||
The controller will periodically verify that the CRD is still installed, and recreate it, if it is not. | ||
|
||
#### CR Creation | ||
|
||
When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object. | ||
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm.... Thinking about this some more, with my operator hat on, I'd rather it not. If the driver-registrar were to do it, then we'd have to rbac it well in order to restrict its access to a safe subset of things. but if you are going through that much effort, you could instead just upload the crd directly in whatever csi packaging is provided, like via k8s yaml / helm chart to the cluster using your admin credentials instead and not worry about it at all. Similar to some of the reasoning Rook took when they tightened security (https://github.com/rook/rook/blob/master/design/security-model.md for details) |
||
If kubelet driver registration is enabled, the kubelet will automatically do this as part of plugin registration. | ||
The driver must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed? Currently obviously that is not part of CSI spec, so how will that get done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
CSI driver authors can choose to but don't have to
The same way all custom k8s behavior is done, side car containers provided by us (e.g. driver-registrar, external-attacher, etc.). |
||
The driver may set any optional configuration fields (like `SkipAttach`) as appropriate. | ||
When the driver is ready to serve, it must set `Ready` in the status to `true`. | ||
|
||
#### Upgrade/Downgrade | ||
This change is backwards compatible. | ||
Existing CSI drivers that are already deployed will not create a `CSIDriver` object. | ||
And Kubernetes will continue to interact with those drivers as it does today. | ||
New drivers MAY create a `CSIDriver` object to change how Kubernetes interacts with them. | ||
|
||
If a Kubernetes cluster is downgraded to a version that does not support `CSIDriver`, but continues to have a CSI driver deployed on it that creates a `CSIDriver` object and expects non-default behavior, Kubernetes will not be able to interact with it correctly (e.g. it may call attach when driver requests attach not to be called resulting in the volumes not mounting). Therefore, cluster admins must ensure if they downgrade, that they also change the CSI driver, if needed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's very likely that the driver will be the same in this case. If I have Gluster CSI driver working in 1.11 with external attacher, I can use the same driver in 1.12 with the external attacher or I may create CSIDriver for it and stop the attacher. The driver itself is the same in all three cases, PVs are the same and pod definitions are the same. Downgrade from a non-attachable driver with a CSIDriver instance to 1.11 is not that easy as you describe, some volumes can be mounted to running pods and we need to make sure they remain mounted after downgrade and unmounted when the pod is deleted. Let's discuss in #2523 |
||
|
||
## Alternatives Considered | ||
Kubernetes already has a kubelet plugin registration mechanism. | ||
So we considered putting this information in the `Node.Status` field. | ||
We realized that there are some plugin level attributes and forcing them in to node level fields could result in inconsistency. | ||
Different nodes could report different values for the same plugin attribute (like supports attach or not) for the same driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we require that ObjectMeta.Name is the CSI driver name? How else can we ensure that each CSI driver has just one CSIDriver instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should require it. We should have validation webhook to verify that you can't create two objects with the same
CSIDriverSpec.Driver
fields.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation is IMO racy. There may be several API servers running in parallel.
And this is wrong, because X might have been rejected by API server (e.g. another webhook failed) and Y should have been allowed.