-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for CSI Cluster Registry #2514
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// (because it does not implement the ControllerPublishVolume() method), | ||
// and that Kubernetes should not call or wait for any attach operation and | ||
// just skip to mounting. | ||
SkipAttach bool `json:"skipAttach"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nitpicking: I think that CSIPluginSpec
should be specification of the driver, not how Kubernetes should handle it. And the driver does not require attach, so RequiresAttach
or something similar would be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the renaming.
} | ||
|
||
// CSIPluginSpec is the specification of a CSIPlugin. | ||
type CSIPluginSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a flag to pass Pod info in NodePublish
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this. | ||
The plugin must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver. | ||
The plugin may set any optional configuration fields (like `SkipAttach`) as appropriate. | ||
When the plugin is ready to serve, it must set `Ready` in the status to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who sets Ready
when the driver uses registration by driver-registrar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane If CSIPlugin object had something like a Registered
field which is set by the kubelet's informer, then driver-registrar could watch for it and then set CSIPlugin as Ready
once it has been seen. In this case there is no any race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It would have to be the kubelet.
#### CRD Installation | ||
|
||
The `CSIPlugin` object schema will checked in to kubernetes/kubernetes repo under a storage directory. | ||
The `CSIPlugin` `CustomResourceDefinition` (CRD) will be installed by the exiting Kubernetes attach/detach controller that is part of the kube-controller-manager binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer single place for registration of CRD for all components, there will be more coming if CRD is the right way for new API objects now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. But it would be easier to use an existing controller for the alpha version.
I changed it to say new controller -- if we have enough time, I'll do new controller, otherwise I'll just do existing controller and we can add in a standalone controller later for beta.
type CSIPluginStatus struct { | ||
// Indicates the volume plugin has been successfully deployed on the cluster | ||
// and is ready to use. | ||
Ready bool `json:"ready"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is consumer of this information? Shall A/D controller postpone any attach operation until CSI plugin gets Ready? Shall Kubelet postpone mounting of volumes? Who/when sets it to true
? Most importantly, who/when sets it to false
? What should A/D controller and kubelet do when the plugin becomes Unready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the initial version no consumers. In the future we can use this as one of the bits for smarter CSI volume aware scheduling.
type CSIPluginStatus struct { | ||
// Indicates the volume plugin has been successfully deployed on the cluster | ||
// and is ready to use. | ||
Ready bool `json:"ready"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this should be an condition, with timestamp when the driver has become (un-)ready, with a message why and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant on expanding the API more for a hypothetical use case. I'd prefer to leave it simple or drop it altogether until we have a real use case. If I drop it, there would be nothing in CSIPluginStatus
so I would have to drop that for now too. What do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am torn on this. Currently I don't see any useful Status to report. But I may come in future. Can we have an object with .Spec and no .Status?
If we use Ready
, we need clear semantics what it means and most importantly what is it good for. Is it enough to have one node with CSI driver installed to have the global object ready? Is it enough to have one node with CSI driver broken (failed liveness probe, driver not installed there, crash loop...) to have the global object not-ready?
In the future we can use this as one of the bits for smarter CSI volume aware scheduling.
For smarter CSI volume aware scheduling I would use decision based on availability / status of drivers on nodes, not a global object. What should scheduler do when the global object is not ready? All it can do is to stop scheduling.
|
||
```go | ||
// CSIPlugin captures information about a Container Storage Interface (CSI) | ||
// volume plugin deployed on the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call them CSI drivers everywhere else. Or did I miss anything?
@jsafrane Addressed feedback, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks usable, I still have some doubts about Ready
status though. Time for API review?
type CSIDriver struct { | ||
metav1.TypeMeta `json:",inline"` | ||
|
||
// Standard object metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
## Design Overview | ||
|
||
A new custom resource will be automatically be installed on Kubernetes clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a new custom resource definition? CR is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
A new custom resource will be automatically be installed on Kubernetes clusters. | ||
|
||
Upon deployment a driver must create a new custom resource object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver may create a new object? It's optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
type CSIDriverStatus struct { | ||
// Indicates the volume driver has been successfully deployed on the cluster | ||
// and is ready to use. | ||
Ready bool `json:"ready"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still unclear to me what "Ready" means. "the volume driver has been successfully deployed" - so all containers (DaemonSet with node components and Deployment with controller components) are running? What one of them fails, who sets Ready=false
? What if I add a new node, shall something set it to Ready=false
, wait until the DaemonSet runs a new pod with node components there and set it to Ready=true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove
We can also use kubernetes#2514 when it becomes available.
We can also use kubernetes#2514 when it becomes available.
We can also use kubernetes#2514 when it becomes available.
When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object. | ||
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this. | ||
If kubelet driver registration is enabled, the kubelet will automatically do this as part of plugin registration. | ||
The driver must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed? Currently obviously that is not part of CSI spec, so how will that get done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed?
CSI driver authors can choose to but don't have to
Currently obviously that is not part of CSI spec, so how will that get done?
The same way all custom k8s behavior is done, side car containers provided by us (e.g. driver-registrar, external-attacher, etc.).
|
||
// Standard object metadata. | ||
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata | ||
metav1.ObjectMeta `json:"metadata"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we require that ObjectMeta.Name is the CSI driver name? How else can we ensure that each CSI driver has just one CSIDriver instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should require it. We should have validation webhook to verify that you can't create two objects with the same CSIDriverSpec.Driver
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation is IMO racy. There may be several API servers running in parallel.
- API server A gets POST CSIDriver.name=X for a driver.
- API server B gets POST CSIDriver.name=Y for the same driver at the same time.
- Both call validation webhook.
- The webhook gets two queries, possibly in parallel. Let's say the webhook synchronizes the requests and processes X fist.
- The webhook can GET CSIDriver to check if it already exists, but in this case it does not return anything useful, X and Y were not created yet.
- The webhook allows X to be created. It remembers that X exists in local cache.
- The webhook processes Y, it can check in the API server if X exists. It does not exist yet, either the API server is slow or it was rejected by another webhook / validation.
- The webhook can check internal cache if it allowed X. It was allowed, so it rejects Y.
And this is wrong, because X might have been rejected by API server (e.g. another webhook failed) and Y should have been allowed.
@thockin @kubernetes/api-approvers PTAL |
And Kubernetes will continue to interact with those drivers as it does today. | ||
New drivers MAY create a `CSIDriver` object to change how Kubernetes interacts with them. | ||
|
||
If a Kubernetes cluster is downgraded to a version that does not support `CSIDriver`, but continues to have a CSI driver deployed on it that creates a `CSIDriver` object and expects non-default behavior, Kubernetes will not be able to interact with it correctly (e.g. it may call attach when driver requests attach not to be called resulting in the volumes not mounting). Therefore, cluster admins must ensure if they downgrade, that they also change the CSI driver, if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very likely that the driver will be the same in this case. If I have Gluster CSI driver working in 1.11 with external attacher, I can use the same driver in 1.12 with the external attacher or I may create CSIDriver for it and stop the attacher. The driver itself is the same in all three cases, PVs are the same and pod definitions are the same.
Downgrade from a non-attachable driver with a CSIDriver instance to 1.11 is not that easy as you describe, some volumes can be mounted to running pods and we need to make sure they remain mounted after downgrade and unmounted when the pod is deleted. Let's discuss in #2523
looks good to me. |
#### CR Creation | ||
|
||
When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object. | ||
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.... Thinking about this some more, with my operator hat on, I'd rather it not. If the driver-registrar were to do it, then we'd have to rbac it well in order to restrict its access to a safe subset of things. but if you are going through that much effort, you could instead just upload the crd directly in whatever csi packaging is provided, like via k8s yaml / helm chart to the cluster using your admin credentials instead and not worry about it at all. Similar to some of the reasoning Rook took when they tightened security (https://github.com/rook/rook/blob/master/design/security-model.md for details)
// will not be passed on mount. | ||
// +optional | ||
PodInfoRequiredOnMount *bool `json:"podInfoRequiredOnMount"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDITED:
Saad, please add field to indicate whether volumeHandle
needs to be automatically generated. I am proposing the following:
type VolumeHandleMode string
const (
AutomaticVolumeHandleGeneration VolumeHandleMode = "automaticVolumeHandleGeneration"
NoVolumeHandleGeneration VolumeHandleMode = "noVolumeHandleGeneration"
)
< snip >
type CSIDriverSpec struct {
< snip >
// Indicates how the volumeHandle for inline CSI volumes will be handled during volume operations.
// If set to AutomaticVolumeHandleGeneration, this provides a hint to the Kubelet to
// automatically generate handles for the inline volume.
//
// If set to `NoVolumeHandleGeneration`, this is a hint to Kubelet not to automatically
// generate handles for volumes when not provided.
VolumeHandleMode *VolumeHandleMode `json:"volumeHandleMode"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be limited only to inline volumes? Or handles for PVs should be (optionally) autogenerated too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... pv's too.... say someone used the image driver with a pv (for some reason). It would break horribly if it wasn't given proper handles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now the language and implementation should be limited to Inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for CSIDriverSpec field to indicate how to handle volume handle.
// (like podName, podUID, etc.) during mount operations. | ||
// If this is set to true, Kubelet will pass pod information as | ||
// VolumeAttributes in the CSI NodePublishVolume() calls. | ||
// If value is not specified, default is false -- meaning pod information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass this always?
// implements the CSI ControllerPublishVolume() method), and that Kubernetes | ||
// should call attach and wait for any attach operation to complete before | ||
// proceeding to mounting. | ||
// If value is not specified, default is true -- meaning attach will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a default as in the normal defaults (will be set in the object) or is it the implied behavior? To set a default will require admission control... Just clarify.
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. CSI Cluster Registry and Node Info CRDs **What this PR does / why we need it**: Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes/enhancements#594 **Special notes for your reviewer**: Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed). **Release note**: ```release-note Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them. ``` CC @jsafrane
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. CSI Cluster Registry and Node Info CRDs **What this PR does / why we need it**: Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes/enhancements#594 **Special notes for your reviewer**: Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed). **Release note**: ```release-note Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them. ``` CC @jsafrane Kubernetes-commit: 85300f4f5dd7b0bd36d0538fb6c3255c06d5e6c2
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. CSI Cluster Registry and Node Info CRDs **What this PR does / why we need it**: Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes/enhancements#594 **Special notes for your reviewer**: Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed). **Release note**: ```release-note Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them. ``` CC @jsafrane Kubernetes-commit: 85300f4f5dd7b0bd36d0538fb6c3255c06d5e6c2
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
If this is still relevant, please recreate as a KEP in the enhancements repo. |
// If value is not specified, default is true -- meaning attach will be | ||
// called. | ||
// +optional | ||
AttachRequired *bool `json:"attachRequired"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a dump question.
Isn't this value already exposed from ControllerServiceCapability
as PUBLISH_UNPUBLISH_VOLUME
? Can't kubelet make decision based on that value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is where the call is being made. Kubelet is not doing the attach. A pod floating in the cluster takes attachment requests, converts them into the driver calls for attach/detach.
You could have the attacher sidecar running, probe the driver as you say, and then determine that it doesn't ever have to do anything. That does work today.
But that's a fair amount of extra pluming running in the cluster that isn't needed.
The flag above tells Kubernetes to skip entirely any of the handshaking at the cluster level required to do that attach so the sidecar that does nothing and its driver doesn't even need to exist.
This proposal introduces a mechanism for CSI volume plugins to register themselves with Kubernetes and configure how Kubernetes interacts with them.
Still a work in progress, but feel free to start commenting.
xref kubernetes/enhancements#594
/assign @jsafrane @thockin