diff --git a/design/handle-backup-of-volumes-by-resources-filters.md b/design/handle-backup-of-volumes-by-resources-filters.md index 15a07c6270..014548d360 100644 --- a/design/handle-backup-of-volumes-by-resources-filters.md +++ b/design/handle-backup-of-volumes-by-resources-filters.md @@ -1,164 +1,165 @@ # Handle backup of volumes by resources filters ## Abstract -Currently, Velero doesn't have one flexible way to filter volumes. +Currently, Velero doesn't have one flexible way to handle volumes. -If users want to skip backup of volumes or only backup some volumes in different namespaces in batch, currently they need to use the opt-in and opt-out approach one by one, or use label-selector but if it has big different labels on each different related pods, which is cumbersome when they have lots of volumes to handle with. it would be convenient if Velero could provide one way to filter the backup of volumes just by `some specific volumes attributes`. - -Also, currently, it's not accurate enough if the users want to select a specific volume to do a backup or skip by without patching labels or annotations to the pods. It would be useful if users could accurately select target volume by `one specific resource selector`. Users could accurately select the volume to backup or skip in their own console when using velero for secondary development. +If users want to skip the backup of volumes or only backup some volumes in different namespaces in batch, currently they need to use the opt-in and opt-out approach one by one, or use label-selector but if it has big different labels on each different related pod, which is cumbersome when they have lots of volumes to handle with. it would be convenient if Velero could provide policies to handle the backup of volumes just by `some specific volumes conditions`. ## Background -As of Today, Velero has lots of filters to handle (backup or skip backup) resources including resources filters like `IncludedNamespaces, ExcludedNamespaces`, label selectors like `LabelSelector, OrLabelSelectors`, annotation like `backup.velero.io/must-include-additional-items` etc. But it's not enough flexible to handle volumes, we need one generic way to filter volumes. +As of Today, Velero has lots of filters to handle (backup or skip backup) resources including resources filters like `IncludedNamespaces, ExcludedNamespaces`, label selectors like `LabelSelector, OrLabelSelectors`, annotation like `backup.velero.io/must-include-additional-items` etc. But it's not enough flexible to handle volumes, we need one generic way to handle volumes. ## Goals -- Introducing one flexible way to filter volumes. +- Introducing flexible policies to handle volumes, and do not patch any labels or annotations to the pods or volumes. -## Non Goals +## Non-Goals - We only handle volumes for backup and do not support restore. -- Currently, only handle volumes, does not support other resources. +- Currently, only handles volumes, and does not support other resources. - Only environment-unrelated and platform-independent general volumes attributes are supported, do not support volumes attributes related to a specific environment. ## Use-cases/Scenarios -### A. Skip backup volumes by some attributes +### Skip backup volumes by some attributes Users want to skip PV with the requirements: - option to skip all PV data - option to skip specified PV type (RBD, NFS) - option to skip specified PV size - option to skip specified storage-class -- option to skip folders - -### B. Accurately select target volume to backup -Some volumes are only used for logging while others volumes are for DBs, and only need to backup DBs data. users need `one specific resource seletor` to accurately select target volumes to backup. ## High-Level Design -Add a new flag `pv-backup-policy-configmap` when executing `velero backup create`, which imports the defined resources filters in one YAML file. the YAML file including all defined filter rules for the current backup. +Add a new flag `backup-policy-from-file` when executing `velero backup create`, which imports the defined volume resources policies in one YAML file. the YAML file defined volume resources policies for the current backup. -When Velero handles volumes backup should respect the filter rules defined in the imported YAML file. +When Velero handles volumes backup should respect the policies defined in the imported YAML file. ## Detailed Design -The resources filters rules should contain both `include` and `exclude` rules. - -For the rules on `one specific resource selector`, we introduced a `GVRN` way of resources filters, for resources are identified by their resource type and resource name, or GVRN. +The volume resources policies should contain a list of policies which is the combination of conditions and related `action`, when target volumes meet the conditions, the related `action` will take effection. -Here we call it `GVRN Selector` which exactly matches the resources to be handled. +Below is the API Design for the user configuration: -For the attributes on `some specific volumes attributes`, we basically follow the defined data struct [PersistentVolumeSpec](https://github.com/kubernetes/kubernetes/blob/v1.26.0/pkg/apis/core/types.go#L304), and only handle partial common fields of it currently. - -Here we call it `Volumes Attributes Selector`, which matches volumes with the same attributes defined. - -### filter fields format -The filter YAML config file would look like this: +### API Design +```go +// Action defined one action for a specific way of backup +type Action struct { + // Type defined specific type of action, it could be 'file-system-backup', 'volume-snapshot', or 'skip' currently + Type string `yaml:"type,omitempty"` + // Parameters defined map of parameters when executing a specific action + // +optional + // +nullable + Parameters map[string]interface{} `yaml:"parameters,omitempty"` +} + +// VolumePolicy defined policy to conditions to match Volumes and related action to handle matched Volumes +type VolumePolicy struct { + // Conditions defined list of conditions to match Volumes + Conditions map[string]interface{} `yaml:"conditions,omitempty"` + Action Action `yaml:"action,omitempty"` +} + +// ResourcePolicies currently defined slice of volume policies to handle backup +type ResourcePolicies struct { + VolumePolicies []VolumePolicy `yaml:"volumePolicies,omitempty"` + // we may support other resource policies in the future, and they could be added separately + // OtherResourcePolicies: []OtherResourcePolicy +} ``` ---- -resources: - - include: - groupResource: "/persistentvolumes" - namespacedNames: "/nginx-logs" - - include: - groupResource: "/persistentvolumes" - namespacedNames: "/minio" - - exclude: - groupResource: apps/deployments - namespacedNames: velero/velero - - exclude: - groupResource: apps/demonset - namespacedNames: velero/node-agent -storage: - pv: - - include: - storageClassName: gp2, ebs-sc - volumeMode: block, filesystem - capacity: OGi,5Gi - persistentVolumeSource: - nfs: - readOnly: true - csi: - driver: aws.efs.csi.driver - - include: - storageClassName: io1 - persistentVolumeSource: - csi: - driver: aws.efs.csi.driver - - exclude: - storageClassName: efs-dynamic-sc - capacity: 5Gi, - persistentVolumeSource: - nfs: {} - csi: - driver: aws.ebs.csi.driver - - exclude: - storageClassName: fsx-lustre - volumeMode: block, filesystem - persistentVolumeSource: - csi: - driver: csi.aws.amazon.com + +The policies YAML config file would look like this: +```yaml +version: v1 +volumePolicies: +# it's a list and if the input item matches the first policy, the latters will be ignored +# each policy consists of a list of conditions and an action + +# each key in the object is one condition, and one policy will apply to resources that meet ALL conditions +- conditions: + # capacity condition matches the volumes whose capacity falls into the range + capacity: "0,100Gi" + csi: + driver: aws.ebs.csi.driver + fsType: ext4 + storageClass: + - gp2 + - ebs-sc + action: + type: volume-snapshot + parameters: + # optional parameters which are custom-defined parameters when doing an action + volume-snapshot-timeout: "6h" +- conditions: + capacity: "0,100Gi" + storageClass: + - gp2 + - ebs-sc + action: + type: file-system-backup +- conditions: + nfs: {} + action: + # type of file-system-backup could be defined a second time + type: file-system-backup +- conditions: + csi: + driver: aws.efs.csi.driver + action: + type: skip ``` ### Filter rules -The whole filter file consists of two parts: resources and storage. - -Both `Kopia, Restic` and `Volume snapshot` share one YAML configuration file. - -#### resources -In the resources part, we defined `GVRN Selector` to filter resources. In a filter, an empty or omitted group, version, resource type, or resource name matches any value. `GVRN selector` could match Persistent Volume and other Kubernetes resources. +#### VolumePolicies +The whole resource policies consist of groups of volume policies. -Taking select PV as an example, if users want to backup PV with name nginx-logs, the `groupResource` could be "/persistentvolumes" in which the group should be empty, the `namespacedNames` could be "/nginx-logs" in which the namespace should be empty. +For one specific volume policy which is a combination of one action and serval conditions. which means one action and serval conditions are the smallest unit of volume pocliy. -#### storage -In the storage part, we defined `Volumes Attributes Selector` to filter resources. - -The storage part defined rules including `pv` and `volume`, which correspond to `Kopia, Restic` and `Volume snapshot`. - -A filter in storage with a specific key and empty value, which means the value matches any value. For example, if the `storage.pv.exclude.persistentVolumeSource.nfs` is `{}` it means if `NFS` is used as `persistentVolumeSource` in Persistent Volume will be skipped no matter what the NFS server or NFS Path is, - -A filter may have multiple values, all the values are concatenated by commas. For example, the `storage.pv.include.storageClassName` is `gp2, ebs-sc` which means Persistent Volume with gp2 or ebs-sc storage class both will be back up. - -The size of each single filter value should limit to 256 bytes in case of an unfriendly long variable assignment. +Volume policies are a list and if the target volumes match the first policy, the latter will be ignored, which would reduce the complexity of matching volumes especially when there are multiple complex volumes policies + +#### Action +`Action` defined one action for a specific way of backup: + - if choosing `Kopia` or `Restic`, the action value would be `file-system-backup`. + - if choosing volume snapshot, the action value would be `volume-snapshot`. + - if choosing skip backup of volume, the action value would be `skip`, and it will skip backup of volume no matter is `file-system-backup` or `volume-snapshot`. + +The policies could be extended for later other ways of backup, which means it may have some other `Action` value that will be assigned in the future. + +Both `file-system-backup` `volume-snapshot`, and `skip` could be partially or fully configured in the YAML file. And configuration could take effect only for the related action. + +#### Conditions +The conditions are serials of volume attributes, the matched Volumes should meet all the volume attributes in one conditions configuration. + +##### Parameters +Parameters are optional for one specific action. For example, it could be `csi-snapshot-timeout: 6h` for CSI snapshot. + +#### Special rule definitions: +- A attribute selector in `AttributeSelectors` with a specific key and empty value, which means the value matches any value. For example, if the `conditions.nfs` is `{}`, it means if `NFS` is used as `persistentVolumeSource` in Persistent Volume will be skipped no matter what the NFS server or NFS Path is. -If user defined pv filter rules but used Kopia or Restic to do a backup, the backup will fail in validating the resource filter configuration. Same as the situation if using defined volume filter rules but using CSI or plugins to take volume snapshots. +- The size of each single filter value should limit to 256 bytes in case of an unfriendly long variable assignment. -For capacity in `pv` or size in `volume`, the value should include the lower value and upper value concatenated by commas. And it has several combinations below: -- "0,5Gi" or "0Gi,5Gi" which means capacity or size matches from 0 to 5Gi, including value 0 and value 5Gi -- ",5Gi" which is equal to "0,5Gi" -- "5Gi," which means capacity or size matches larger than 5Gi, including value 5Gi -- "5Gi" which is not supported and will be failed in validating configuration. +- For capacity for PV or size for Volume, the value should include the lower value and upper value concatenated by commas. And it has several combinations below: + - "0,5Gi" or "0Gi,5Gi" which means capacity or size matches from 0 to 5Gi, including value 0 and value 5Gi + - ",5Gi" which is equal to "0,5Gi" + - "5Gi," which means capacity or size matches larger than 5Gi, including value 5Gi + - "5Gi" which is not supported and will be failed in validating configuration. -### Filter Reference -Currently, resources filters are defined in `BackupSpec` struct, it will be more and more bloated with adding more and more filters which makes the size of `Backup` CR bigger and bigger, so we want to store the resources rules in configmap, and `Backup` CRD reference to current configmap. +### Configmap Reference +Currently, resources policies are defined in `BackupSpec` struct, it will be more and more bloated with adding more and more filters which makes the size of `Backup` CR bigger and bigger, so we want to store the resources policies in configmap, and `Backup` CRD reference to current configmap. the `configmap` would be like this: -``` +```yaml apiVersion: v1 data: - filter.YAML: |- - { - "resources": { - "include": { - ... - }, - "exclude": { - ... - } - }, - "storage": { - "pv": { - "include": { - ... - }, - "exclude": { - ... - } - }, - "volume": { - "include": { - ... - }, - "exclude": { - ... - } - } - } - } + policies.yaml: + ---- + version: v1 + volumePolicies: + - conditions: + capacity: "0,100Gi" + csi: + driver: aws.ebs.csi.driver + fsType: ext4 + storageClass: + - gp2 + - ebs-sc + action: + type: volume-snapshot + parameters: + volume-snapshot-timeout: "6h" kind: ConfigMap metadata: creationTimestamp: "2023-01-16T14:08:12Z" @@ -168,31 +169,52 @@ metadata: uid: b73e7f76-fc9e-4e72-8e2e-79db717fe9f1 ``` -a new variable `filterConfigmap` would be added into `BackupSpec`, it's value is assigned with current resources filters configmap -``` +A new variable `resourcePolices` would be added into `BackupSpec`, it's value is assigned with the current resources policy configmap +```yaml apiVersion: velero.io/v1 kind: Backup metadata: name: backup-1 spec: - resourcesFilter: + resourcePolices: refType: Configmap ref: backup01 ... ``` -The configmap basically equivalent to generated by command `kubectl create cm backup01 --from-file filter.yaml` +The configmap basically equivalent to what is generated by the command `kubectl create cm backup01 --from-file policies.yaml` -The configmap only stores those filters assigned value not the whole resources filters. +The configmap only stores those assigned values, not the whole resources polices. The name of the configmap is `$BackupName`, and it's in Velero install namespace. -#### Life-cycle of resource filter configmap -- Resource filter configmap only been generated once Velero backup command with flag `pv-backup-policy-configmap` to import configuration. -- If the referenced backup has been deleted, the relevant configmap should be removed. - -### Display of volume resource filter -As the resource filter configmap is referenced by backup CR, the rules in configmap are not so intuitive, so we need to integrate rules in configmap to the output of the command `velero backup describe`, and make it more readable. +#### Life-cycle of resources policies configmap +- Resource policies configmap only been generated once Velero backup command with flag `backup-policy-configmap` to import configuration. +- Resource policies configmap could be referenced by another backup in Velero backup command with the flag `policy-from-backup` to reference configuration in another backup. +- If all referenced backup has been deleted, the relevant configmap should be removed. +#### Versioning +Here we introduced the version field in the YAML config to contain break changes in case of some compatibility problems: +```yaml +version: v1 +volumePolicies: + .... +``` +#### Multiple versions supporting +To manage the effort for maintenance, we will only support one version of the data in velero. Suppose that there is one break change for the YAML in Velero v1.13, we should bump up the config version to V2, and v2 is only supported in v1.13. For the existing data with version: v1, it should migrate them when the Velero startup, this won't hurt the existing backup schedule CR as it only references the configmap. To make the migration easier, the configmap for such resource filter policies should be labeled manually before Velero startup like this, Velero will migrate the labeled configmap. +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + labels: +# This label can be optional but if this is not set, the backup will fail after the breaking change and the user will need to update the data manually + velero.io/resource-filter-policies: "true" + name: example + namespace: velero +data: + ..... +``` +### Display of resources polices +As the resource policies configmap is referenced by backup CR, the policies in configmap are not so intuitive, so we need to integrate policies in configmap to the output of the command `velero backup describe`, and make it more readable. ## Compatibility Currently, we have these resources filters: @@ -209,9 +231,17 @@ Currently, we have these resources filters: - backup.velero.io/backup-volumes - backup.velero.io/must-include-additional-items -So it should be careful with the combination of volumes resources filter rules and the above resources filters. -- When volumes resources filter rules conflict with the above resources filters, we should respect the above resources filters. For example, if the user used the opt-out approach to `backup.velero.io/backup-volumes-excludes` annotation on the pod and also defined include volume in volumes resources filters configuration, we should respect the opt-out approach to skip backup of the volume. -- The filtered resources would be the intersection of the result with the above resources filters and volumes resources filters. For example, if user defined `IncludedNamespaces=nginx-example` and also included PV with `storageClassName=gp2`, which results in backing up the volume in nginx-example. - +So it should be careful with the combination of volumes resources policies and the above resources filters. +- When volume resource policies conflict with the above resource filters, we should respect the above resource filters. For example, if the user used the opt-out approach to `backup.velero.io/backup-volumes-excludes` annotation on the pod and also defined include volume in volumes resources filters configuration, we should respect the opt-out approach to skip backup of the volume. +- The volumes resources policies would be the intersection of the result with the above resources filters and volumes resources filters. For example, if the user-defined `IncludedNamespaces=nginx-example` and also included PV with `storageClassName=gp2`, which results in backing up the volume in nginx-example. +- If volume resources policies conflict with themselves, the filter will result in no conflicting parts. + ## Implementation -This implementation should be included in Velero v1.11.0 \ No newline at end of file +This implementation should be included in Velero v1.11.0 + +## Alternatives Considered +Here we support the user define the YAML config file and storing the resources policies into configmap, also we could define one resource's policies CRD and store policies imported from the user-defined config file in the related CR. + +## Open Issues + +Should we support more than one version of filter policies configmap? \ No newline at end of file