diff --git a/contributors/design-proposals/storage/csi-snapshot.md b/contributors/design-proposals/storage/csi-snapshot.md new file mode 100644 index 00000000000..8eb77779626 --- /dev/null +++ b/contributors/design-proposals/storage/csi-snapshot.md @@ -0,0 +1,377 @@ +Kubernetes CSI Snapshot Proposal +================================ + +**Authors:** [Jing Xu](https://github.com/jingxu97), [Xing Yang](https://github.com/xing-yang), [Tomas Smetana](https://github.com/tsmetana), [Huamin Chen ](https://github.com/rootfs) + +## Background + +Many storage systems (GCE PD, Amazon EBS, etc.) provide the ability to create "snapshots" of persistent volumes to protect against data loss. Snapshots can be used in place of a traditional backup system to back up and restore primary and critical data. Snapshots allow for quick data backup (for example, it takes a fraction of a second to create a GCE PD snapshot) and offer fast recovery time objectives (RTOs) and recovery point objectives (RPOs). Snapshots can also be used for data replication, distribution and migration. + +As the initial effort to support snapshot in Kubernetes, volume snapshotting has been released as a prototype in Kubernetes 1.8. An external controller and provisioner (i.e. two separate binaries) have been added in the [external storage repo](https://github.com/kubernetes-incubator/external-storage/tree/master/snapshot). The prototype currently supports GCE PD, AWS EBS, OpenStack Cinder, GlusterFS, and Kubernetes hostPath volumes. Volume snapshots APIs are using [CRD](https://kubernetes.io/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions/). + +To continue that effort, this design is proposed to add the snapshot support for CSI Volume Drivers. Because the overal trend in Kubernetes is to keep the core APIs as small as possible and use CRD for everything else, this proposal will add CRD definitions to represent snapshots, add external snapshot controller to handle volume snapshotting, and modify in-tree PV controller and out-of-tree external provisioner to support restore volume from snapshot. To be consistent with the existing CSI Volume Driver support documented [here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md), a sidecar "Kubernetes to CSI" proxy container called "external-snapshotter" will be provided to watch the Kubernetes API on behalf of the out-of-tree CSI Volume Driver and trigger the appropriate operations (i.e., create snapshot and delete snapshot) against the "CSI Volume Driver" container. The CSI snapshot spec is proposed [here](https://github.com/container-storage-interface/spec/pull/224). + + +## Objectives + +For the first version of snapshotting support in Kubernetes, only on-demand snapshots for CSI Volume Drivers will be supported. + + +### Goals + +* Goal 1: Expose standardized snapshotting operations to create, list, and delete snapshots in Kubernetes REST API. +Currently the APIs will be implemented with CRD (CustomResourceDefinitions). + +* Goal 2: Implement CSI volume snapshot support. +An external snapshot controller will be deployed with other external components (e.g., external-attacher, external-provisioner) for each CSI Volume Driver. + +* Goal 3: Provide a convenient way of creating new and restoring existing volumes from snapshots. + + +### Non-Goals + +The following are non-goals for the current phase, but will be considered at a later phase. + +* Goal 4: Offer application-consistent snapshots by providing pre/post snapshot hooks to freeze/unfreeze applications and/or unmount/mount file system. + +* Goal 5: Provide higher-level management of backing up and restoring a pod and statefulSet. + + +## Design Overview + +With this proposal, volume snapshots are considered as another type of storage resources managed by Kubernetes. Therefore the snapshot API and controller follow the design of existing volume management. There are two APIs, VolumeSnapshot and VolumeSnapshotContent, which are similar to the structure of PersistentVolumeClaim and PersistentVolume. The external snapshot controller functions also similar to the in-tree PV controller. With the snapshots APIs, we also propose to add data sources in PersistentVolumeClaim (PVC) API in order to support restore snapshots to volumes. The following section explains in more details about the APIs and the controller design. + + +## Design Details + +### Snapshot API Design + +The API design of VolumeSnapshot and VolumeSnapshotContent is modeled after PersistentVolumeClaim and PersistentVolume. + +#### The `VolumeSnapshot` Object + +```GO + +// The volume snapshot object accessible to the user. Upon successful creation of the actual +// snapshot by the volume provider it is bound to the corresponding VolumeSnapshotContent through +// the VolumeSnapshotSpec +type VolumeSnapshot struct { + metav1.TypeMeta `json:",inline"` + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec represents the desired state of the snapshot + // +optional + Spec VolumeSnapshotSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` + + // Status represents the latest observer state of the snapshot + // +optional + Status VolumeSnapshotStatus `json:"status" protobuf:"bytes,3,opt,name=status"` +} + +type VolumeSnapshotList struct { + metav1.TypeMeta `json:",inline"` + metav1.ListMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + Items []VolumeSnapshot `json:"items"` +} + +// VolumeSnapshotSpec is the desired state of the volume snapshot +type VolumeSnapshotSpec struct { + // PersistentVolumeClaimName is the name of the PVC being snapshotted + // +optional + PersistentVolumeClaimName string `json:"persistentVolumeClaimName" protobuf:"bytes,1,opt,name=persistentVolumeClaimName"` + + // SnapshotDataName binds the VolumeSnapshot object with the VolumeSnapshotContent + // +optional + SnapshotDataName string `json:"snapshotDataName" protobuf:"bytes,2,opt,name=snapshotDataName"` + + // Name of the VolumeSnapshotClass required by the volume snapshot. + // +optional + VolumeSnapshotClassName string `json:"storageClassName" protobuf:"bytes,3,opt,name=storageClassName"` +} + +type VolumeSnapshotStatus struct { + // CreatedAt is the time the snapshot was successfully created. If it is set, + // it means the snapshot was created; Otherwise the snapshot was not created. + // +optional + CreatedAt *metav1.Time `json:"createdAt" protobuf:"bytes,1,opt,name=createdAt"` + + // AvailableAt is the time the snapshot was successfully created and available + // for use. A snapshot MUST have already been created before it can be available. + // If a snapshot was available, it indicates the snapshot was created. + // When the snapshot was created but not available yet, the application can be + // resumed if it was previously frozen before taking the snapshot. In this case, + // it is possible that the snapshot is being uploaded to the cloud. For example, + // both GCE and AWS support uploading of the snapshot after it is cut as part of + // the Create Snapshot process. + // If the timestamp AvailableAt is set, it means the snapshot was available; + // Otherwise the snapshot was not available. + // +optional + AvailableAt *metav1.Time `json:"availableAt" protobuf:"bytes,2,opt,name=availableAt"` + + // The time the error occurred during the snapshot creation (or uploading) process + // +optional + FailedAt *metav1.Time `json:"failedAt" protobuf:"bytes,3,opt,name=failedAt"` + + // A brief CamelCase string indicating details about why the snapshot is in error state. + // +optional + Reason string + + // A human-readable message indicating details about why the snapshot is in error state. + // +optional + Message string +} + +``` + +#### The `VolumeSnapshotContent` Object + +```GO + +// +genclient=true +// +nonNamespaced=true + +// VolumeSnapshotContent represents the actual "on-disk" snapshot object +type VolumeSnapshotContent struct { + metav1.TypeMeta `json:",inline"` + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Spec represents the desired state of the snapshot + // +optional + Spec VolumeSnapshotContentSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"` +} + +// VolumeSnapshotContentList is a list of VolumeSnapshotContent objects +type VolumeSnapshotContentList struct { + metav1.TypeMeta `json:",inline"` + Metadata metav1.ListMeta `json:"metadata"` + Items []VolumeSnapshotContent `json:"items"` +} + +// The desired state of the volume snapshot data +type VolumeSnapshotContentSpec struct { + // Source represents the location and type of the volume snapshot + VolumeSnapshotSource `json:",inline" protobuf:"bytes,1,opt,name=volumeSnapshotSource"` + + // VolumeSnapshotRef is part of bi-directional binding between VolumeSnapshot + // and VolumeSnapshotContent + // +optional + VolumeSnapshotRef *core_v1.ObjectReference `json:"volumeSnapshotRef" protobuf:"bytes,2,opt,name=volumeSnapshotRef"` + + // PersistentVolumeRef represents the PersistentVolume that the snapshot has been + // taken from + // +optional + PersistentVolumeRef *core_v1.ObjectReference `json:"persistentVolumeRef" protobuf:"bytes,3,opt,name=persistentVolumeRef"` +} + +// Represents the actual location and type of the snapshot. +// Only CSI volume snapshot source is supported now. +type VolumeSnapshotSource struct { + // CSI (Container Storage Interface) represents storage that handled by an external CSI Volume Driver (Alpha feature). + // +optional + CSI *CSIVolumeSnapshotSource `json:"csiVolumeSnapshotSource,omitempty"` +} + +// Represents the source from CSI volume snapshot +type CSIVolumeSnapshotSource struct { + // Driver is the name of the driver to use for this snapshot. + // Required. + Driver string `json:"driver"` + + // SnapshotHandle is the unique snapshot id returned by the CSI volume + // plugin’s CreateSnapshot to refer to the snapshot on all subsequent calls. + // Required. + SnapshotHandle string `json:"snapshotHandle"` + + // Timestamp when the point-in-time snapshot is taken on the storage + // system. The format of this field should be a Unix nanoseconds time + // encoded as an int64. On Unix, the command `date +%s%N` returns + // the current time in nanoseconds since 1970-01-01 00:00:00 UTC. + // This field is REQUIRED. + CreatedAt int64 `json:"createdAt,omitempty" protobuf:"varint,3,opt,name=createdAt"` +} + +``` + +#### The `DataSource` Object in PVC + +Add a new field into PVC to represent the source of the data which is prepopulated to the provisioned volume. Possible source types may include the following: + + * VolumeSnapshot: restore snapshot to a new volume + * PersistentVolumeClaim: clone volume which is represented by PVC + +``` + +type PersistentVolumeClaimSpec struct { + // If specified, volume will be prepopulated with data from the PVCDataSourceRef. + // +optional + PVCDataSourceRef *core_v1.LocalObjectReference `json:"dataSourceRef" protobuf:"bytes,2,opt,name=dataSourceRef"` + +} + +``` + +The existing LocalObjectReference in core API will be modified to add a `Kind`. + +``` + +// LocalObjectReference contains enough information to let you locate the referenced object inside the same namespace. +type LocalObjectReference struct { + //TODO: Add other useful fields. apiVersion, kind, uid? + Name string + // Kind indicates the type of the object reference. + // +optional + Kind string +} + +``` + +In the first version, only VolumeSnapshot will be a supported `Type` for data source object reference. PersistentVolumeClaim will be added in a future version. If unsupported `Type` is used, the PV Controller SHALL bail out of the operation. + + +#### The `VolumeSnapshotClass` Object + +A new VolumeSnapshotClass API object will be added to avoid mixing parameters between snapshots and volumes. Each CSI Volume Driver can have its own default VolumeSnapshotClass. If VolumeSnapshotClass is not provided, a default will be used. It allows to add new parameters for snapshots. + +``` + +type VolumeSnapshotClass struct { + metav1.TypeMeta `json:",inline"` + // +optional + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` + + // Snapshotter is the driver expected to handle this VolumeSnapshotClass. + // This value may not be empty. + Snapshotter string + + // Parameters holds parameters for the snapshotter. + // These values are opaque to the system and are passed directly + // to the snapshotter. The only validation done on keys is that they are + // not empty. The maximum number of parameters is + // 512, with a cumulative max size of 256K + // +optional + Parameters map[string]string +} + +``` + +#### Add `DataSource` to `StorageClass` + +DataSource will be added as an optional parameter in `StorageClass` to allow volumes belonging to the same `StorageClass` to be created with prepopulated data. + +``` + +type StorageClass struct { + // If specified, volume will be prepouplated with data from the DataSource + // +optional + PVCDataSource DataSource +} + +``` + + +### CSI External Snapshot Controller +* External snapshotter is responsible for creating/deleting snapshots and binding snapshot and snapshotData objects. + +* External snapshotter is running in the sidecar along with external-attacher and external-provisioner for each CSI Volume Driver. + +* For dynamically created snapshot, it should have a VolumeSnapshotClass associated with it. User can explicitly specify a VolumeSnapshotClass in the VolumeSnapshot API object. If user does not specify a VolumeSnapshotClass, a default VolumeSnapshotClass created by the admin will be used. This is similar to how a default StorageClass created by the admin will be used for the provisioning of a PersistentVolumeClaim. + +* For statically binding snapshot, user/admin must specify pointers for both VolumeSnapshot and VolumeSnapshotContent, so that the controller knows how to bind them. + +![CSI Snapshot Diagram](csi-snapshot_diagram.png?raw=true "CSI Snapshot Diagram") + + +External snapshotter is part of Kubernetes implementation of [Container Storage Interface (CSI)](https://github.com/container-storage-interface/spec). It is an external controller that monitors `VolumeSnapshot` and `VolumeSnapshotContent` objects and creates/deletes snapshot from volumes. The original Kubernetes CSI design can be found at Kubernetes [design proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md). + +External snapshotter follows [controller](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out `VolumeSnapshot` instances with `Snapshotter==` and processes these events in workqueues with exponential backoff. Real handling is deferred to the `Handler` interface. + +The snapshotter talks to out-of-tree CSI Volume Driver over socket (`/run/csi/socket` by default, configurable by `-csi-address`). The snapshotter tries to connect for `-connection-timeout` (1 minute by default), allowing CSI Volume Driver to start and create its server socket a bit later. + +The snapshotter creates CreateSnapshotRequest and calls CreateSnapshot through CSI ControllerClient. It gets CreateSnapshotResponse from the out-of-tree CSI Volume Driver and creates a VolumeSnapshotContent API object with VolumeSnapshotSource. + +The snapshotter checks status in VolumeSnapshot to decide whether to bind. It binds VolumeSnapshot and VolumeSnapshotContent when ready. + +When the storage system fails to create snapshot, retry will not be performed in the first version. This is because users may not want to retry when taking consistent snapshots or scheduled snapshots when the timing of the snapshot creation is important. In a future version, a maxRetries flag will be added to allow users to control whether retries are needed. + + +### Changes in PV Controller and CSI External Provisioner + +Both the CSI External Provisioner and the in-tree PV Controller will be modified to support provisioning a volume from a snapshot data source. + +`DataSource` is added in both `StorageClass` and `PersistentVolumeClaim` to represent the source of the data which is prepopulated to the provisioned volume. If `DataSource` is added to both during volume provisioning, `DataSource` in `PersistentVolumeClaim` will override `DataSource` in `StorageClass`. + +The operation of the provisioning of a volume from a snapshot data source will be handled by the out-of-tree CSI External Provisioner. + +The in-tree PV Controller will handle the binding of the PV and PVC once they are ready. + + +### CSI Volume Driver Snapshot Support + +The out-of-tree CSI Volume Driver creates a snapshot on the backend storage system or cloud provider, and calls CreateSnapshot through CSI ControllerServer and returns CreateSnapshotResponse. The out-of-tree CSI Volume Driver needs to implement the following functions: + +* CreateSnapshot, DeleteSnapshot, and create volume from snapshot if it supports CREATE_DELETE_SNAPSHOT. +* ListSnapshots if it supports LIST_SNAPSHOTS. + +ListSnapshots can be an expensive operation because it will try to list all snapshots on the storage system. For a storage system that takes nightly periodic snapshots, the total number of snapshots on the system can be huge. Kubernetes should try to avoid this call if possible. Instead, calling ListSnapshots with a specific snapshot_id as filtering to query the status of the snapshot will be more desirable and efficient. + +CreateSnapshot is a synchronous function and it must be blocking until the snapshot is cut. For cloud providers that support the uploading of a snapshot as part of the creating snapshot operation, CreateSnapshot function must also be blocking until the snapshot is cut and after that it shall return an operation pending gRPC error code until the uploading process is complete. + +Create volume from snapshot will be handled by the CreateVolume controller function in the CSI Volume Driver. + +Refer to [Container Storage Interface (CSI)](https://github.com/container-storage-interface/spec) for detailed instructions on how CSI Volume Driver shall implement snapshot functions. + + +## Transition to the New Snapshot Support + +### Existing Implementation in External Storage Repo + +For the snapshot implementation in [external storage repo](https://github.com/kubernetes-incubator/external-storage/tree/master/snapshot), an external snapshot controller and an external provisioner need to be deployed. + +* The old implementation does not support CSI volume drivers. +* VolumeSnapshotClass is not needed to create a snapshot and this concept does not exist in the old design. +* To restore a volume from the snapshot, however, user needs to create a new StorageClass that is different from the original one for the PVC. + +Here is an example yaml file to create a snapshot in the old design: + +```GO + +apiVersion: volumesnapshot.external-storage.k8s.io/v1 +kind: VolumeSnapshot +metadata: + name: hostpath-test-snapshot +spec: + persistentVolumeClaimName: pvc-test-hostpath + +``` + +### New Snapshot Design for CSI + +For the new snapshot model, a sidecar "Kubernetes to CSI" proxy container called "external-snapshotter" needs to be deployed in addition to the sidecar container for the external provisioner. This deployment model is shown in the CSI Snapshot Diagram in the CSI External Snapshot Controller section. + +* The new design supports CSI volume drivers. +* To create a snapshot for CSI, a VolumeSnapshotClass can be created and specified in the spec of VolumeSnapshot. +* To restore a volume from the snapshot, user should use the same StorageClass that is used for the original PVC. + +Here is an example to create a VolumeSnapshotClass and to create a snapshot in the new design: + +```GO + +apiVersion: volumesnapshot.csi.k8s.io/v1alpha1 +kind: VolumeSnapshotClass +metadata: + name: csi-hostpath-snapclass + snapshotter: csi-hostpath +--- +apiVersion: volumesnapshot.csi.k8s.io/v1alpha1 +kind: VolumeSnapshot +metadata: + name: snapshot-demo +spec: + snapshotClassName: csi-hostpath-snapclass + persistentVolumeClaimName: hpvc + +``` + +To transition from the old model to the new model, user needs to stop using the old deployment, and follow the new design to deploy the snapshot controller and create VolumeSnapshots. diff --git a/contributors/design-proposals/storage/csi-snapshot_diagram.png b/contributors/design-proposals/storage/csi-snapshot_diagram.png new file mode 100644 index 00000000000..e040126e15f Binary files /dev/null and b/contributors/design-proposals/storage/csi-snapshot_diagram.png differ