Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for CSI Cluster Registry #2514

Closed
wants to merge 3 commits into from

Conversation

saad-ali
Copy link
Member

@saad-ali saad-ali commented Aug 10, 2018

This proposal introduces a mechanism for CSI volume plugins to register themselves with Kubernetes and configure how Kubernetes interacts with them.

Still a work in progress, but feel free to start commenting.

xref kubernetes/enhancements#594

/assign @jsafrane @thockin

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 10, 2018
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Aug 10, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: saad-ali

If they are not already assigned, you can assign the PR to them by writing /assign @saad-ali in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from childsb and jbeda August 10, 2018 16:41
// (because it does not implement the ControllerPublishVolume() method),
// and that Kubernetes should not call or wait for any attach operation and
// just skip to mounting.
SkipAttach bool `json:"skipAttach"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking: I think that CSIPluginSpec should be specification of the driver, not how Kubernetes should handle it. And the driver does not require attach, so RequiresAttach or something similar would be more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the renaming.

}

// CSIPluginSpec is the specification of a CSIPlugin.
type CSIPluginSpec struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a flag to pass Pod info in NodePublish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this.
The plugin must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver.
The plugin may set any optional configuration fields (like `SkipAttach`) as appropriate.
When the plugin is ready to serve, it must set `Ready` in the status to `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who sets Ready when the driver uses registration by driver-registrar?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane If CSIPlugin object had something like a Registered field which is set by the kubelet's informer, then driver-registrar could watch for it and then set CSIPlugin as Ready once it has been seen. In this case there is no any race.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It would have to be the kubelet.

#### CRD Installation

The `CSIPlugin` object schema will checked in to kubernetes/kubernetes repo under a storage directory.
The `CSIPlugin` `CustomResourceDefinition` (CRD) will be installed by the exiting Kubernetes attach/detach controller that is part of the kube-controller-manager binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer single place for registration of CRD for all components, there will be more coming if CRD is the right way for new API objects now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But it would be easier to use an existing controller for the alpha version.

I changed it to say new controller -- if we have enough time, I'll do new controller, otherwise I'll just do existing controller and we can add in a standalone controller later for beta.

type CSIPluginStatus struct {
// Indicates the volume plugin has been successfully deployed on the cluster
// and is ready to use.
Ready bool `json:"ready"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is consumer of this information? Shall A/D controller postpone any attach operation until CSI plugin gets Ready? Shall Kubelet postpone mounting of volumes? Who/when sets it to true? Most importantly, who/when sets it to false? What should A/D controller and kubelet do when the plugin becomes Unready?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the initial version no consumers. In the future we can use this as one of the bits for smarter CSI volume aware scheduling.

type CSIPluginStatus struct {
// Indicates the volume plugin has been successfully deployed on the cluster
// and is ready to use.
Ready bool `json:"ready"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this should be an condition, with timestamp when the driver has become (un-)ready, with a message why and so on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant on expanding the API more for a hypothetical use case. I'd prefer to leave it simple or drop it altogether until we have a real use case. If I drop it, there would be nothing in CSIPluginStatus so I would have to drop that for now too. What do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am torn on this. Currently I don't see any useful Status to report. But I may come in future. Can we have an object with .Spec and no .Status?

If we use Ready, we need clear semantics what it means and most importantly what is it good for. Is it enough to have one node with CSI driver installed to have the global object ready? Is it enough to have one node with CSI driver broken (failed liveness probe, driver not installed there, crash loop...) to have the global object not-ready?

In the future we can use this as one of the bits for smarter CSI volume aware scheduling.

For smarter CSI volume aware scheduling I would use decision based on availability / status of drivers on nodes, not a global object. What should scheduler do when the global object is not ready? All it can do is to stop scheduling.


```go
// CSIPlugin captures information about a Container Storage Interface (CSI)
// volume plugin deployed on the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call them CSI drivers everywhere else. Or did I miss anything?

@saad-ali
Copy link
Member Author

@jsafrane Addressed feedback, PTAL

Copy link
Member

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks usable, I still have some doubts about Ready status though. Time for API review?

type CSIDriver struct {
metav1.TypeMeta `json:",inline"`

// Standard object metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


## Design Overview

A new custom resource will be automatically be installed on Kubernetes clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new custom resource definition? CR is optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


A new custom resource will be automatically be installed on Kubernetes clusters.

Upon deployment a driver must create a new custom resource object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver may create a new object? It's optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

type CSIDriverStatus struct {
// Indicates the volume driver has been successfully deployed on the cluster
// and is ready to use.
Ready bool `json:"ready"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still unclear to me what "Ready" means. "the volume driver has been successfully deployed" - so all containers (DaemonSet with node components and Deployment with controller components) are running? What one of them fails, who sets Ready=false? What if I add a new node, shall something set it to Ready=false, wait until the DaemonSet runs a new pod with node components there and set it to Ready=true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove

gnufied added a commit to gnufied/community that referenced this pull request Aug 15, 2018
We can also use kubernetes#2514
when it becomes available.
gnufied added a commit to gnufied/community that referenced this pull request Aug 15, 2018
We can also use kubernetes#2514
when it becomes available.
gnufied added a commit to gnufied/community that referenced this pull request Aug 15, 2018
We can also use kubernetes#2514
when it becomes available.
When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object.
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this.
If kubelet driver registration is enabled, the kubelet will automatically do this as part of plugin registration.
The driver must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed? Currently obviously that is not part of CSI spec, so how will that get done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed?

CSI driver authors can choose to but don't have to

Currently obviously that is not part of CSI spec, so how will that get done?

The same way all custom k8s behavior is done, side car containers provided by us (e.g. driver-registrar, external-attacher, etc.).


// Standard object metadata.
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
metav1.ObjectMeta `json:"metadata"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we require that ObjectMeta.Name is the CSI driver name? How else can we ensure that each CSI driver has just one CSIDriver instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should require it. We should have validation webhook to verify that you can't create two objects with the same CSIDriverSpec.Driver fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation is IMO racy. There may be several API servers running in parallel.

  1. API server A gets POST CSIDriver.name=X for a driver.
  2. API server B gets POST CSIDriver.name=Y for the same driver at the same time.
  3. Both call validation webhook.
  4. The webhook gets two queries, possibly in parallel. Let's say the webhook synchronizes the requests and processes X fist.
  5. The webhook can GET CSIDriver to check if it already exists, but in this case it does not return anything useful, X and Y were not created yet.
  6. The webhook allows X to be created. It remembers that X exists in local cache.
  7. The webhook processes Y, it can check in the API server if X exists. It does not exist yet, either the API server is slow or it was rejected by another webhook / validation.
  8. The webhook can check internal cache if it allowed X. It was allowed, so it rejects Y.

And this is wrong, because X might have been rejected by API server (e.g. another webhook failed) and Y should have been allowed.

@saad-ali saad-ali changed the title [WIP] Proposal for CSI Cluster Registry Proposal for CSI Cluster Registry Aug 17, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2018
@saad-ali
Copy link
Member Author

@thockin @kubernetes/api-approvers PTAL
@kubernetes/sig-storage-api-reviews is happy with this.
I'm going to start implementing this to unblock other designs pending on this.

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Aug 17, 2018
And Kubernetes will continue to interact with those drivers as it does today.
New drivers MAY create a `CSIDriver` object to change how Kubernetes interacts with them.

If a Kubernetes cluster is downgraded to a version that does not support `CSIDriver`, but continues to have a CSI driver deployed on it that creates a `CSIDriver` object and expects non-default behavior, Kubernetes will not be able to interact with it correctly (e.g. it may call attach when driver requests attach not to be called resulting in the volumes not mounting). Therefore, cluster admins must ensure if they downgrade, that they also change the CSI driver, if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very likely that the driver will be the same in this case. If I have Gluster CSI driver working in 1.11 with external attacher, I can use the same driver in 1.12 with the external attacher or I may create CSIDriver for it and stop the attacher. The driver itself is the same in all three cases, PVs are the same and pod definitions are the same.

Downgrade from a non-attachable driver with a CSIDriver instance to 1.11 is not that easy as you describe, some volumes can be mounted to running pods and we need to make sure they remain mounted after downgrade and unmounted when the pod is deleted. Let's discuss in #2523

@kfox1111
Copy link

looks good to me.

#### CR Creation

When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object.
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.... Thinking about this some more, with my operator hat on, I'd rather it not. If the driver-registrar were to do it, then we'd have to rbac it well in order to restrict its access to a safe subset of things. but if you are going through that much effort, you could instead just upload the crd directly in whatever csi packaging is provided, like via k8s yaml / helm chart to the cluster using your admin credentials instead and not worry about it at all. Similar to some of the reasoning Rook took when they tightened security (https://github.com/rook/rook/blob/master/design/security-model.md for details)

// will not be passed on mount.
// +optional
PodInfoRequiredOnMount *bool `json:"podInfoRequiredOnMount"`
}
Copy link
Member

@vladimirvivien vladimirvivien Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDITED:
Saad, please add field to indicate whether volumeHandle needs to be automatically generated. I am proposing the following:

type VolumeHandleMode string

const (
    AutomaticVolumeHandleGeneration VolumeHandleMode = "automaticVolumeHandleGeneration"
    NoVolumeHandleGeneration VolumeHandleMode = "noVolumeHandleGeneration"
)

< snip >

type CSIDriverSpec struct {
   < snip >

    // Indicates how the volumeHandle for inline CSI volumes will be handled during volume operations.
    // If set to AutomaticVolumeHandleGeneration, this provides a hint to the Kubelet to 
    // automatically generate handles for the inline volume.
    //
    // If set to `NoVolumeHandleGeneration`, this is a hint to Kubelet not to automatically 
    // generate handles for volumes when not provided.
    VolumeHandleMode *VolumeHandleMode   `json:"volumeHandleMode"`
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be limited only to inline volumes? Or handles for PVs should be (optionally) autogenerated too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... pv's too.... say someone used the image driver with a pv (for some reason). It would break horribly if it wasn't given proper handles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now the language and implementation should be limited to Inline.

Copy link
Member

@vladimirvivien vladimirvivien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for CSIDriverSpec field to indicate how to handle volume handle.

// (like podName, podUID, etc.) during mount operations.
// If this is set to true, Kubelet will pass pod information as
// VolumeAttributes in the CSI NodePublishVolume() calls.
// If value is not specified, default is false -- meaning pod information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass this always?

// implements the CSI ControllerPublishVolume() method), and that Kubernetes
// should call attach and wait for any attach operation to complete before
// proceeding to mounting.
// If value is not specified, default is true -- meaning attach will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a default as in the normal defaults (will be set in the object) or is it the implied behavior? To set a default will require admission control... Just clarify.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 31, 2018
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

CSI Cluster Registry and Node Info CRDs

**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/enhancements#594

**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).

**Release note**:

```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```

CC @jsafrane
sttts pushed a commit to sttts/csi-api that referenced this pull request Sep 4, 2018
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

CSI Cluster Registry and Node Info CRDs

**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/enhancements#594

**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).

**Release note**:

```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```

CC @jsafrane

Kubernetes-commit: 85300f4f5dd7b0bd36d0538fb6c3255c06d5e6c2
k8s-publishing-bot added a commit to kubernetes-retired/csi-api that referenced this pull request Sep 4, 2018
Automatic merge from submit-queue (batch tested with PRs 64283, 67910, 67803, 68100). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

CSI Cluster Registry and Node Info CRDs

**What this PR does / why we need it**:
Introduces the new `CSIDriver` and `CSINodeInfo` API Object as proposed in kubernetes/community#2514 and kubernetes/community#2034

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/enhancements#594

**Special notes for your reviewer**:
Per the discussion in https://groups.google.com/d/msg/kubernetes-sig-storage-wg-csi/x5CchIP9qiI/D_TyOrn2CwAJ the API is being added to the staging directory of the `kubernetes/kubernetes` repo because the consumers will be attach/detach controller and possibly kubelet, but it will be installed as a CRD (because we want to move in the direction where the API server is Kubernetes agnostic, and all Kubernetes specific types are installed).

**Release note**:

```release-note
Introduce CSI Cluster Registration mechanism to ease CSI plugin discovery and allow CSI drivers to customize Kubernetes' interaction with them.
```

CC @jsafrane

Kubernetes-commit: 85300f4f5dd7b0bd36d0538fb6c3255c06d5e6c2
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2019
@bgrant0607
Copy link
Member

If this is still relevant, please recreate as a KEP in the enhancements repo.
/close

@bgrant0607 bgrant0607 closed this Jan 25, 2019
// If value is not specified, default is true -- meaning attach will be
// called.
// +optional
AttachRequired *bool `json:"attachRequired"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a dump question.

Isn't this value already exposed from ControllerServiceCapability as PUBLISH_UNPUBLISH_VOLUME? Can't kubelet make decision based on that value?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is where the call is being made. Kubelet is not doing the attach. A pod floating in the cluster takes attachment requests, converts them into the driver calls for attach/detach.

You could have the attacher sidecar running, probe the driver as you say, and then determine that it doesn't ever have to do anything. That does work today.

But that's a fair amount of extra pluming running in the cluster that isn't needed.

The flag above tells Kubernetes to skip entirely any of the handshaking at the cluster level required to do that attach so the sidecar that does nothing and its driver doesn't even need to exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.