Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for CSI Cluster Registry #2514

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# CSI Cluster Registry Design Doc

***Status:*** Pending

***Version:*** Alpha

***Author:*** Saad Ali ([@saad-ali](https://github.com/saad-ali), [email protected])

## Terminology

Term | Definition
---|---
Container Storage Interface (CSI) | A specification attempting to establish an industry standard interface that Container Orchestration Systems (COs) can use to expose arbitrary storage systems to their containerized workloads.
CSI Volume Plugin | A new, in-tree volume plugin that acts as an adapter and enables out-of-tree, third-party CSI volume drivers to be used in Kubernetes.
CSI Volume Driver | An out-of-tree CSI compatible implementation of a volume plugin that can be used in Kubernetes through the Kubernetes CSI Volume Plugin.

# Summary

This document proposes a "CSI cluster registration mechanism".

This mechanism will enable:
* A CSI volume driver to register itself with Kubernetes when it is deployed.
* A CSI volume driver to customize how Kubernetes interacts with it (e.g. skip attach process because the driver doesn't support ControllerPublish, etc.).
* A user or cluster admin to easily discover which CSI volume drivers are deployed on their Kubernetes cluster.
* Be optional -- a driver may choose not to use it (and will get the default set of behaviors).


## Background & Motivations

Kubernetes supports the Container Storage Interface (CSI) to enable third party storage developers to deploy volume drivers exposing new storage systems in Kubernetes without having to touch the core Kubernetes code.
Support for CSI was introduced as alpha in Kubernetes v1.9, and moved to beta in v1.10.
See "CSI Volume Plugins in Kubernetes Design Doc" in the "Links" section below for details on how an arbitrary CSI volume driver is deployed and interacts with Kubernetes.
The beta implementation of CSI has a number of limitations, including:
* CSI drivers must be deployed with the provided CSI external-attacher sidecar container, even if they don’t implement ControllerPublishVolume.
* Meaning a CSI volume driver deployed on kubernetes has to deploy a CSI `external-attacher` container even if the volume driver doesn't require a "volume attach" operation.
* In this case, the CSI `external-attacher` container basically does a no-op in response to Kubernetes `VolumeAttachment` objects to allow Kubernetes to continue with the mounting process.
* Users and cluster admins have no easy way to discover what CSI volume drivers are deployed on their Kubernetes cluster.

The proposed "CSI cluster registration mechanism" should address these issues.

### Links

* [Container Storage Interface (CSI) Spec](https://github.com/container-storage-interface/spec/blob/master/spec.md)
* [CSI Volume Plugins in Kubernetes Design Doc](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md)

## Objective

The objective of this document is to document all the requirements for enabling a cluster level registry for a CSI compliant volume drivers in Kubernetes.

## Goals

* Allow a CSI volume driver (that opts-in to new mechanism) to configure how Kubernetes should interact with it.
* Improve discoverability by users/cluster admins of a CSI volume driver (that opts-in to new mechanism) deployed on the Kubernetes cluster.

## Non-Goals

* Define how CSI volume driver should be deployed on Kubernetes.
* Require use of new CSI cluster registry mechanism for all CSI volume drivers deployed on Kubernetes (the mechanism will be optional/opt-in).

## Design Overview

A new custom resource definition will automatically be installed on Kubernetes clusters.

Upon deployment a driver MAY create a new custom resource object.

## Design Details

### `CSIDriver` Object

#### Proposed API

```go
// CSIDriver captures information about a Container Storage Interface (CSI)
// volume driver deployed on the cluster.
//
// CSIDriver objects are non-namespaced.
type CSIDriver struct {
metav1.TypeMeta `json:",inline"`

// Standard object metadata.
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
metav1.ObjectMeta `json:"metadata"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we require that ObjectMeta.Name is the CSI driver name? How else can we ensure that each CSI driver has just one CSIDriver instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should require it. We should have validation webhook to verify that you can't create two objects with the same CSIDriverSpec.Driver fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation is IMO racy. There may be several API servers running in parallel.

  1. API server A gets POST CSIDriver.name=X for a driver.
  2. API server B gets POST CSIDriver.name=Y for the same driver at the same time.
  3. Both call validation webhook.
  4. The webhook gets two queries, possibly in parallel. Let's say the webhook synchronizes the requests and processes X fist.
  5. The webhook can GET CSIDriver to check if it already exists, but in this case it does not return anything useful, X and Y were not created yet.
  6. The webhook allows X to be created. It remembers that X exists in local cache.
  7. The webhook processes Y, it can check in the API server if X exists. It does not exist yet, either the API server is slow or it was rejected by another webhook / validation.
  8. The webhook can check internal cache if it allowed X. It was allowed, so it rejects Y.

And this is wrong, because X might have been rejected by API server (e.g. another webhook failed) and Y should have been allowed.


// Specification describing the CSI volume driver and any custom
// configuration for it.
Spec CSIDriverSpec `json:"spec"`
}

// CSIDriverList is a collection of CSIDriver objects.
type CSIDriverList struct {
metav1.TypeMeta `json:",inline"`

// Standard list metadata
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
metav1.ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

// Items is the list of CSIDrivers
Items []CSIDriver `json:"items"`
}

// CSIDriverSpec is the specification of a CSIDriver.
type CSIDriverSpec struct {
// Driver indicates the name of the CSI driver that this object refers to.
// This MUST be the same name returned by the CSI GetPluginName() call for
// that driver.
Driver string `json:"driver"`

// Indicates this CSI volume driver requires an attach operation (because it
// implements the CSI ControllerPublishVolume() method), and that Kubernetes
// should call attach and wait for any attach operation to complete before
// proceeding to mounting.
// If value is not specified, default is true -- meaning attach will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a default as in the normal defaults (will be set in the object) or is it the implied behavior? To set a default will require admission control... Just clarify.

// called.
// +optional
AttachRequired *bool `json:"attachRequired"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a dump question.

Isn't this value already exposed from ControllerServiceCapability as PUBLISH_UNPUBLISH_VOLUME? Can't kubelet make decision based on that value?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is where the call is being made. Kubelet is not doing the attach. A pod floating in the cluster takes attachment requests, converts them into the driver calls for attach/detach.

You could have the attacher sidecar running, probe the driver as you say, and then determine that it doesn't ever have to do anything. That does work today.

But that's a fair amount of extra pluming running in the cluster that isn't needed.

The flag above tells Kubernetes to skip entirely any of the handshaking at the cluster level required to do that attach so the sidecar that does nothing and its driver doesn't even need to exist.


// Indicates this CSI volume driver requires additional pod information
// (like podName, podUID, etc.) during mount operations.
// If this is set to true, Kubelet will pass pod information as
// VolumeAttributes in the CSI NodePublishVolume() calls.
// If value is not specified, default is false -- meaning pod information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass this always?

// will not be passed on mount.
// +optional
PodInfoRequiredOnMount *bool `json:"podInfoRequiredOnMount"`
}
Copy link
Member

@vladimirvivien vladimirvivien Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDITED:
Saad, please add field to indicate whether volumeHandle needs to be automatically generated. I am proposing the following:

type VolumeHandleMode string

const (
    AutomaticVolumeHandleGeneration VolumeHandleMode = "automaticVolumeHandleGeneration"
    NoVolumeHandleGeneration VolumeHandleMode = "noVolumeHandleGeneration"
)

< snip >

type CSIDriverSpec struct {
   < snip >

    // Indicates how the volumeHandle for inline CSI volumes will be handled during volume operations.
    // If set to AutomaticVolumeHandleGeneration, this provides a hint to the Kubelet to 
    // automatically generate handles for the inline volume.
    //
    // If set to `NoVolumeHandleGeneration`, this is a hint to Kubelet not to automatically 
    // generate handles for volumes when not provided.
    VolumeHandleMode *VolumeHandleMode   `json:"volumeHandleMode"`
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be limited only to inline volumes? Or handles for PVs should be (optionally) autogenerated too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... pv's too.... say someone used the image driver with a pv (for some reason). It would break horribly if it wasn't given proper handles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now the language and implementation should be limited to Inline.


```

#### CRD Installation

The `CSIDriver` object schema will checked in to kubernetes/kubernetes repo under a storage directory.
The `CSIDriver` `CustomResourceDefinition` (CRD) will be installed by a new Kubernetes controller that is responsible for ensuring required CRDs are installed.
The controller will periodically verify that the CRD is still installed, and recreate it, if it is not.

#### CR Creation

When a CSI volume driver is deployed on Kubernetes, it may optionally, register it self with Kubernetes by creating a new `CSIDriver` object.
The Kubernetes team will modify the CSI [`driver-registrar` container](https://github.com/kubernetes-csi/driver-registrar) to automatically do this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.... Thinking about this some more, with my operator hat on, I'd rather it not. If the driver-registrar were to do it, then we'd have to rbac it well in order to restrict its access to a safe subset of things. but if you are going through that much effort, you could instead just upload the crd directly in whatever csi packaging is provided, like via k8s yaml / helm chart to the cluster using your admin credentials instead and not worry about it at all. Similar to some of the reasoning Rook took when they tightened security (https://github.com/rook/rook/blob/master/design/security-model.md for details)

If kubelet driver registration is enabled, the kubelet will automatically do this as part of plugin registration.
The driver must set the `Driver` field to the same name returned by the CSI `GetPluginName()` call for that driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed? Currently obviously that is not part of CSI spec, so how will that get done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean that a CSI driver has to ship with some mechanism of instantiating CRs when it is deployed?

CSI driver authors can choose to but don't have to

Currently obviously that is not part of CSI spec, so how will that get done?

The same way all custom k8s behavior is done, side car containers provided by us (e.g. driver-registrar, external-attacher, etc.).

The driver may set any optional configuration fields (like `SkipAttach`) as appropriate.
When the driver is ready to serve, it must set `Ready` in the status to `true`.

#### Upgrade/Downgrade
This change is backwards compatible.
Existing CSI drivers that are already deployed will not create a `CSIDriver` object.
And Kubernetes will continue to interact with those drivers as it does today.
New drivers MAY create a `CSIDriver` object to change how Kubernetes interacts with them.

If a Kubernetes cluster is downgraded to a version that does not support `CSIDriver`, but continues to have a CSI driver deployed on it that creates a `CSIDriver` object and expects non-default behavior, Kubernetes will not be able to interact with it correctly (e.g. it may call attach when driver requests attach not to be called resulting in the volumes not mounting). Therefore, cluster admins must ensure if they downgrade, that they also change the CSI driver, if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very likely that the driver will be the same in this case. If I have Gluster CSI driver working in 1.11 with external attacher, I can use the same driver in 1.12 with the external attacher or I may create CSIDriver for it and stop the attacher. The driver itself is the same in all three cases, PVs are the same and pod definitions are the same.

Downgrade from a non-attachable driver with a CSIDriver instance to 1.11 is not that easy as you describe, some volumes can be mounted to running pods and we need to make sure they remain mounted after downgrade and unmounted when the pod is deleted. Let's discuss in #2523


## Alternatives Considered
Kubernetes already has a kubelet plugin registration mechanism.
So we considered putting this information in the `Node.Status` field.
We realized that there are some plugin level attributes and forcing them in to node level fields could result in inconsistency.
Different nodes could report different values for the same plugin attribute (like supports attach or not) for the same driver.