-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(zfspv): adding backup and restore support #162
Conversation
Codecov Report
@@ Coverage Diff @@
## master #162 +/- ##
=========================================
- Coverage 9.88% 9.82% -0.07%
=========================================
Files 20 20
Lines 1163 1171 +8
=========================================
Hits 115 115
- Misses 1047 1055 +8
Partials 1 1
Continue to review full report at Codecov.
|
b9e893f
to
f90925c
Compare
f8248c3
to
6e9f607
Compare
97976fe
to
6e0145a
Compare
is | ||
minLength: 1 | ||
type: string | ||
prevSnapName: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q_: Are there names in sync with the proposed refactoring changes planned in velero plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proposed refactoring changes planned in velero plugin
not aware of this, can you share the proposal link.
pkg/driver/agent.go
Outdated
go func() { | ||
err := backup.Start(&ControllerMutex, stopCh) | ||
if err != nil { | ||
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_: log message can probably say volume snapshot backup controller
and the below one as volume snapshot restore controller
.
@@ -63,7 +67,7 @@ var ( | |||
func init() { | |||
|
|||
OpenEBSNamespace = os.Getenv(OpenEBSNamespaceKey) | |||
if OpenEBSNamespace == "" { | |||
if OpenEBSNamespace == "" && os.Getenv("OPENEBS_NODE_DRIVER") != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is namespace only required for node driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node-driver env can be "agent" or "controller". So it is needed if we are using ZFS-LocalPV driver.
@@ -0,0 +1,246 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a doc.go to this folder that lists the various low-level cases that are supported and any limitations?
Few things that are not clear by going through the code easily are:
- how the restore differentiate between zvol and zfs datasets?
- how does the file-system information get passed on to the restore process?
For example:
The restore flow is as follows:
- plugin creates a restore PV (is this correct?)
- plugin create a ZFS restore CR to restore the data from remote backup into the restore PV?
- restore controller (on node) keeps a a watch for new CRs associated with the node id.
- if status != init or marked for deletion, restore controller will execute the zfs recv | create command.
Limitations:
- If the restore fails due to network issues? Will it be re-attempted?
- If the restore doesn't have the specified backup, will it marked as failed?
- Similar to the need for nodeID being same, the expectation is also that capacity should be available on the dest node/pool?
- What happens when the same volume is restored twice or if the dest pool already has a volume with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes looks good.
pkg/driver/agent.go
Outdated
go func() { | ||
err := backup.Start(&ControllerMutex, stopCh) | ||
if err != nil { | ||
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log message needs to be changed for backup and restart start function error handling.
pkg/mgmt/backup/backup.go
Outdated
klog.Infof("backup %s done %s@%s prevsnap [%s]", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName, bkp.Spec.PrevSnapName) | ||
err = zfs.UpdateBkpInfo(bkp, apis.BKPZFSStatusDone) | ||
} else { | ||
klog.Errorf("backup %s failed %s@%s", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to log error here for debugging
pkg/mgmt/backup/backup.go
Outdated
func (c *BkpController) addBkp(obj interface{}) { | ||
bkp, ok := obj.(*apis.ZFSBackup) | ||
if !ok { | ||
runtime.HandleError(fmt.Errorf("Couldn't get bkp object %#v", obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better to use full form instead of bkp
if zfs.NodeID != bkp.Spec.OwnerNodeID { | ||
return | ||
} | ||
klog.Infof("Got add event for Bkp %s snap %s@%s", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be move to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog I guess does not have debug. May be later we can add verbose to these logs.
} | ||
} else { | ||
// if status is init then it means we are creating the zfs backup. | ||
if bkp.Status == apis.BKPZFSStatusInit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this condition backup resource needs to be created with init status. Should we add check here to execute backup if status is empty also? Or we need to mark empty status as invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: else / if can be merged here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do the validation in the CR. working on adding validation schema for this.
pkg/mgmt/restore/restore.go
Outdated
// ZFSRestore should be deleted. Check if deletion timestamp is set | ||
if !c.isDeletionCandidate(rstr) { | ||
// if finalizer is not set then it means we are creating | ||
// the zfs backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment needs to be changed.
Here adding Backup and Restore controller, which will be watching for the events. The velero plugin will create a Backup CR to create a backup with the remote location information, the controller will send the data to that remote location. In the same way, the velero plugin will create a Restore CR to restore the volume from the the remote location and the restore controller will restore the data. Steps to use velero plugin for ZFS-LocalPV are : 1. install velero 2. add openebs plugin velero plugin add openebs/velero-plugin:latest 3. Create the volumesnapshot location : for full backup :- ```yaml apiVersion: velero.io/v1 kind: VolumeSnapshotLocation metadata: name: default namespace: velero spec: provider: openebs.io/zfspv-blockstore config: bucket: velero prefix: zfs namespace: openebs provider: aws region: minio s3ForcePathStyle: "true" s3Url: http://minio.velero.svc:9000 ``` for incremental backup :- ```yaml apiVersion: velero.io/v1 kind: VolumeSnapshotLocation metadata: name: default namespace: velero spec: provider: openebs.io/zfspv-blockstore config: bucket: velero prefix: zfs backup: incremental namespace: openebs provider: aws region: minio s3ForcePathStyle: "true" s3Url: http://minio.velero.svc:9000 ``` 4. Create backup velero backup create my-backup --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-cloud-default --storage-location=default 5. Create Schedule velero create schedule newschedule --schedule="*/1 * * * *" --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-local-default --storage-location=default 6. Restore from backup velero restore create --from-backup my-backup --restore-volumes=true --namespace-mappings velero-ns:ns1 Signed-off-by: Pawan <[email protected]>
pkg/mgmt/backup/doc.go
Outdated
|
||
- Backup controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource. | ||
|
||
- The Backup controller will take a snapshot and then send the data to the remote location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one probably needs little clarity. remote location here refers to velero controller that receives the data and then pushes to S3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, updated it.
pkg/mgmt/backup/doc.go
Outdated
|
||
- It will save the namespace information where the pvc is created also while taking the backup. Plugin will use this info if restoring without a namespace mapping to find if volume has already been restored. | ||
|
||
- plugin then creates the ZFSBackup CR with the destination volume and remote location where the data needs to be send. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to mention, status should be init
pkg/mgmt/backup/doc.go
Outdated
|
||
- Backup controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource. | ||
|
||
- The Backup controller will take a snapshot and then send the data to the remote location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to remove send the data to remote location
since it is part of next step and may create confusion.
Signed-off-by: Pawan <[email protected]>
Signed-off-by: Pawan <[email protected]>
Why is this PR required? What issue does it fix?:
This is the PR to handle backup and restore request at ZFS-LocalPV side. There will be changes at the velero plugin side also and the work is in progress for that.
Here velero plugin will create the backup CR to create a backup and ZFS-LocalPV driver will watch for that CR and create the back and transfer the data to the mentioned localtion. In the similar way to restore the data from the remote location, velero plugin will create the restore CR with the remote location information where the data is and ZFS-LocalPV will restore from that location.
How to use?:
Steps to use velero plugin for ZFS-LocalPV are :
for backup :-
The PR in the Velero repo is here : openebs/velero-plugin#102
Does this PR require any upgrade changes?:
no
If the changes in this PR are manually verified, list down the scenarios covered and commands you used for testing with logs:
tested by doing backing up the full namespace, deleted it, and restored it and verified the data.
Any additional information for your reviewer?:
Limitation:
if restoring in a different cluster, it should have same nodes and same zpool present.
To Do:
incremental backup is supported fully. incremental restore is supported partially(with few limitation). Need to add full support for scheduled(incremental) restore.
Checklist:
<type>(<scope>): <subject>