-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for Handling backup of volumes by resources filters #5773
Design for Handling backup of volumes by resources filters #5773
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5773 +/- ##
==========================================
+ Coverage 40.67% 40.90% +0.23%
==========================================
Files 243 248 +5
Lines 21035 21554 +519
==========================================
+ Hits 8555 8817 +262
- Misses 11857 12100 +243
- Partials 623 637 +14
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
063b507
to
a388d6c
Compare
943f15c
to
7f44611
Compare
hi @pradeepkchaturvedi, Here is one draft design pr for handling backup volume by resources filters. If you have any opinion, please comment on it. |
7f44611
to
7833ea5
Compare
- option to skip folders | ||
|
||
### B. Accurately select target volume to backup | ||
Some volumes are only used for logging while others volumes are for DBs, and only need to backup DBs data. users need `one specific resource seletor` to accurately select target volumes to backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we support that if these two PVs have the same storage class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those PVs has the same attributes and are hard to distinguish, we could use GVRN ways to accurately select PV to backup or skip. Which would be a group of combinations of groupResource
+ namespacedNames
dfe780e
to
641b48e
Compare
#### Versioning | ||
Here we introduced the version field in the YAML config to contain break changes in case of some compatibility problems: | ||
```yaml | ||
version: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason to choose to ConfigMap over a new CRD for resource policy? CRD would allow API versioning of the policy resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more details to the design now.
the reason we choose Configmap is:
CRD is more like one kind of resource with status, Kubernates API Server handles the lifecycle of a CR and handles it in different statuses. Compared to CRD, Configmap is more focused to store data. And we only want store policies, so Configmap is more suitable.
CRD would allow API versioning
also needs our Velero codes to explain it, so it may be no different to use Configmap, we don't change the version of Configmap, but maintain the version in data part of Configmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did think about adding CR, but favor CM over CR for 2 reasons:
- We don't want to introduce the full set of subcommands immediately to manage the lifecycle of these CRs, we may make it a CR in the future if this approach is verified.
- There's also debate that, this piece of data is purely for reference, we don't want to introduce a controller to reconcile the status if we make it a CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status
subresource is optional. Policy engines like OPA and Kyverno use CRD to represent policies. With CRD, at least users are familiar with the corresponding K8s versioning scheme.
- conditions: | ||
csi: | ||
driver: aws.efs.csi.driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there will be a catch-all condition that we can use at the end? For example if user wants to skip all PVs except storage class abc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether should we have a catch-all condition
is depend on user configuration. If users configured it, velero would support it.
The current situation would be the ignored volumes will not be backed up.
suppose we have volumes A, B, C, and D
, through all kinds of resources filters, A and B are defined to include, D is excluded, Velero will backup volumes A
and B
, and D
won't be backed up. the ignored C
will not be backed up by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's a good enhancement, thanks @rnarenpujari
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current situation would be the ignored volumes will not be backed up.
suppose we have volumes A, B, C, and D, through all kinds of resources filters, A and B are defined to include, D is excluded, Velero will backup volumes A and B, and D won't be backed up. the ignored C will not be backed up by default.
OK skip was not a good example. If we want to snapshot storage class abc but use restic for everything else, then a catch-all/wildcard condition at the end would be nice to have I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume in that case, you just defined 2 conditions
; one for storage class abc
with volume-snapshot
, and the wildcard condition with restic
.
@qiuming-best how would one represent a policy with wildcard conditions?
@reasonerjt @qiuming-best I am not convinced that we need to special-case the filtering of persistent volumes. The introduction of a non-K8s API and corresponding versioning scheme to represent volume policies feels a bit much. Feels like all the Can you elaborate a bit more on the The field selector mechanism proposed in #5842 uses the apiVersion: velero.io/v1
kind: Backup
metadata:
name: my-backup
namespace: velero
spec:
includedClusterScopeResources:
- groupResource:
group: "v1"
resource: "persistentvolumes"
selectors:
capacity: "0,100Gi"
csi:
driver: aws.ebs.csi.driver
fsType: ext4
storageClass:
- gp2
- ebs-sc
excludedClusterScopeResources:
- groupResource:
group: "v1"
resource: "persistentvolumes"
selectors:
csi:
driver: aws.efs.csi.driver
- groupResource:
group: "v1"
resource: "persistentvolumes"
selectors:
nfs: {} Building on top of #5838, I believe this approach can significantly reduce the implementation and maintenance scope, where we just need to focus on the handling and conversion of the field selectors. (K8s only supports a handful of field selectors on the server-side.) Even if we think the proposed volume policy is necessary, I don't think #5842 contradicts this proposal. It addresses issues like #5152 and #5118 which this proposal doesn't. Based on our discussion on the community call, I would like to get a sense of how this policy can be extended to include other cluster-scoped resources like CRDs, |
cfcf707
to
bcb501b
Compare
@ihcsim The reason there's the The fieldSelector in #5842 may be sufficient for filtering resources based on names, but as for PV there are quite a few fields, and nested structures in the spec. In our internal discussion, we found matching the values of the fields does not satisfy the use case, for example, I agree the policy proposed in the PR does not contradict #5842. I was thinking we may extend the policy to add |
// VolumePolicy defined policy to conditions to match Volumes and related action to handle matched Volumes | ||
type VolumePolicy struct { | ||
// Conditions defined list of conditions to match Volumes | ||
Conditions map[string]interface{} `yaml:"conditions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our internal discussion, we found matching the values of the fields does not satisfy the use case, for example, capacity is a range in the policy data, and to apply the policy against a PV we do not simply check if the value in the selector is the same as in the spec, therefore we introduce the condition which does not have to be a 1:1 mapping to each field in the spec.
@reasonerjt As proposed, the field selector approach can be extended (client-side) to work like this. AIUI, there aren't ways to do server-side filtering for fields like capacity
, right? We will have to retrieve all the PVs and then apply the filters on the client-side. If that's true, the underlying implementation will be the same regardless of whether it's fields.Fields
, field.Selector
, or Conditions
in the API. (Fields like name
, namespace
and status
will work without additional implementation because K8s know how to filter by those fields.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we should introduce a 1:1 mapping to the PV, or put this chunk of data into backup CR?
IMO, 1:1 mapping may be sufficient but a more flexible condition map can be easier to create and maintain, I don't see a very strong reason we must keep this 1:1 mapping to the fileds of PV resource.
- conditions: | ||
csi: | ||
driver: aws.efs.csi.driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume in that case, you just defined 2 conditions
; one for storage class abc
with volume-snapshot
, and the wildcard condition with restic
.
@qiuming-best how would one represent a policy with wildcard conditions?
#### Versioning | ||
Here we introduced the version field in the YAML config to contain break changes in case of some compatibility problems: | ||
```yaml | ||
version: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status
subresource is optional. Policy engines like OPA and Kyverno use CRD to represent policies. With CRD, at least users are familiar with the corresponding K8s versioning scheme.
|
||
Volume policies are a list and if the target volumes match the first policy, the latter will be ignored, which would reduce the complexity of matching volumes especially when there are multiple complex volumes policies. | ||
|
||
#### Action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason there's the action and parameters fields is that we wanna make sure there're flexibilities so that the user can choose to do more actions against them in addition to include/exclude.
@reasonerjt Seems to me there is potentially a "filtering" and a "grouping" requirements. If grouping is the main use case where we want to apply certain actions to a group of volumes, I wonder if a volume group CRD might be clearer and more structured than a data
in a ConfigMap. The Backup
CR can reference the volume groups, instead of the ConfigMap.
If grouping and filtering can be tackled separately, the Backup
CRD spec already has filter-related fields. We can change it from []string
to a struct to fulfill the Conditions
requirement. We should keep #5842, which goes with #5838, separated from this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about grouping, it's about variety of actions, we wanna support actions other than "include/exclude".
As for putting the filter in or out of Backup CR, we wanna put it out of the CR for two reasons:
- Control the size of backup CR. We've been continuously adding new fields to the CRD to add new features, despite it's already
v1
, I hope we can reduce such changes, and will open another issue to discuss thev1 -> v2
transformation of velero CRDs. - Make the policy re-usable across backups. There has been some requirement from other adopters that they wanna introduce a
policy
CR and use them with different backups, this is a step towards that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"include/exclude" could not further policy distinct when one backup both have volume-snapshot
and file-system-backup
, so grouping
does not work, we introduced actions
way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Policy re-use sounds like a good use case to me. If the ultimate goal is a policy CR, why not start with a CR right from the start? As for not adding new fields to backup, I think you will have to add something to it anyway, even if it's just a ref
(or a list of ref
s), right?
Regardless, can we keep and review #5842, and possibly add it to #5838, as I feel like it's different from the policy concept in this proposal? Thanks.
7f9ec68
to
b11b839
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @qiuming-best for putting this together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a few comments
// Action defined as one action for a specific way of backup | ||
type Action struct { | ||
// Type defined specific type of action, it could be 'file-system-backup', 'volume-snapshot', or 'skip' currently | ||
Type string `yaml:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiuming-best @reasonerjt If we support only a handful type of actions shouldn't the Action.Type
be of some struct instead of string ? like:
type VolumeActionType string
const Skip VolumeActionType = "skip"
const VolumeSnapshot VolumeActionType = "volume-snapshot" etc
type Action struct {
Type VolumeActionType
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll modify it
## Implementation | ||
This implementation should be included in Velero v1.11.0 | ||
|
||
Currently, in Velero v1.11.0 we only support `Action` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add backup level Validation where in we check whether the specified action in the policy is supported by velero install instance or not. For example, If the user specified fs-backup
but the velero install does not have restic or kopia then in that case we could fail the backup even before we start processing it ? WYGT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users could config many kinds of Action Type
, they can config both fs-backup
and volume-snapshot
types of Action
. When doing the backup for one specific volume, we will first check which types of Action
the volume matched, if all policies are validated and returned it could use fs-backup
and volume-snapshot
both ok, and the current backup is through CSI plugin to take a snapshot, it will take a snapshot for volume-snapshot
matched even though users have configured anther fs-backup
for it.
If we add lots of validation related to the installation or environments it will make our validation complicated, and some redundant configurations by users are also acceptable.
- backup.velero.io/must-include-additional-items | ||
|
||
So it should be careful with the combination of volumes resources policies and the above resources filters. | ||
- When volume resource policies conflict with the above resource filters, we should respect the above resource filters. For example, if the user used the opt-out approach to `backup.velero.io/backup-volumes-excludes` annotation on the pod and also defined include volume in volumes resources filters configuration, we should respect the opt-out approach to skip backup of the volume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Also, please be sure to add this in the velero documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Signed-off-by: Ming <[email protected]>
b11b839
to
d72e88a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @qiuming-best
Thank you for contributing to Velero!
Please add a summary of your change
Does your change fix a particular issue?
Fixes #(issue)
#5035
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.