Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zfspv): adding backup and restore support #162

Merged
merged 4 commits into from
Sep 8, 2020

Conversation

pawanpraka1
Copy link
Contributor

@pawanpraka1 pawanpraka1 commented Jun 25, 2020

Why is this PR required? What issue does it fix?:

This is the PR to handle backup and restore request at ZFS-LocalPV side. There will be changes at the velero plugin side also and the work is in progress for that.

Here velero plugin will create the backup CR to create a backup and ZFS-LocalPV driver will watch for that CR and create the back and transfer the data to the mentioned localtion. In the similar way to restore the data from the remote location, velero plugin will create the restore CR with the remote location information where the data is and ZFS-LocalPV will restore from that location.

How to use?:

Steps to use velero plugin for ZFS-LocalPV are :

  1. install velero
velero install --provider aws --bucket velero --secret-file /home/pawan/pawan/credentials-minio --plugins velero/velero-plugin-for-aws:v1.0.0-beta.1 --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000 --use-volume-snapshots=true --use-restic

  1. add openebs plugin
velero plugin add openebs/velero-plugin:latest
  1. Create the volumesnapshot location :

for backup :-

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/zfspv-blockstore
  config:
    bucket: velero
    prefix: zfs
    namespace: openebs # this is namespace where ZFS-LocalPV creates all the CRs, passed as OPENEBS_NAMESPACE env in the ZFS-LocalPV deployment
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.velero.svc:9000
  1. Create backup
velero backup create my-backup --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=default --storage-location=default
  1. Create Schedule
velero create schedule newschedule  --schedule="*/1 * * * *" --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-local-default --storage-location=default
  1. Restore from backup
velero restore create --from-backup my-backup --restore-volumes=true --namespace-mappings velero-ns:ns-2

The PR in the Velero repo is here : openebs/velero-plugin#102

Does this PR require any upgrade changes?:

no

If the changes in this PR are manually verified, list down the scenarios covered and commands you used for testing with logs:

  • velero full backup and restore
  • velero schedule(full) backup and restore
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/zfspv-blockstore
  config:
    bucket: velero
    prefix: zfs
    namespace: openebs
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.velero.svc:9000

tested by doing backing up the full namespace, deleted it, and restored it and verified the data.

Any additional information for your reviewer?:

  1. added support for full backup
  2. added support for scheduled(full) backup

Limitation:

if restoring in a different cluster, it should have same nodes and same zpool present.

To Do:

incremental backup is supported fully. incremental restore is supported partially(with few limitation). Need to add full support for scheduled(incremental) restore.

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

@pawanpraka1 pawanpraka1 added enhancement Add new functionality to existing feature pr/hold-merge hold the merge. labels Jun 25, 2020
@pawanpraka1 pawanpraka1 added this to the v0.9 milestone Jun 25, 2020
@pawanpraka1 pawanpraka1 requested a review from kmova June 25, 2020 08:21
@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2020

Codecov Report

Merging #162 into master will decrease coverage by 0.06%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #162      +/-   ##
=========================================
- Coverage    9.88%   9.82%   -0.07%     
=========================================
  Files          20      20              
  Lines        1163    1171       +8     
=========================================
  Hits          115     115              
- Misses       1047    1055       +8     
  Partials        1       1              
Impacted Files Coverage Δ
pkg/driver/agent.go 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5e645b...247cee0. Read the comment docs.

@pawanpraka1
Copy link
Contributor Author

@kmova I am already working on that part #200.

is
minLength: 1
type: string
prevSnapName:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q_: Are there names in sync with the proposed refactoring changes planned in velero plugin?

Copy link
Contributor Author

@pawanpraka1 pawanpraka1 Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposed refactoring changes planned in velero plugin

not aware of this, can you share the proposal link.

go func() {
err := backup.Start(&ControllerMutex, stopCh)
if err != nil {
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_: log message can probably say volume snapshot backup controller and the below one as volume snapshot restore controller.

@@ -63,7 +67,7 @@ var (
func init() {

OpenEBSNamespace = os.Getenv(OpenEBSNamespaceKey)
if OpenEBSNamespace == "" {
if OpenEBSNamespace == "" && os.Getenv("OPENEBS_NODE_DRIVER") != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is namespace only required for node driver?

Copy link
Contributor Author

@pawanpraka1 pawanpraka1 Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node-driver env can be "agent" or "controller". So it is needed if we are using ZFS-LocalPV driver.

pkg/mgmt/restore/start.go Outdated Show resolved Hide resolved
pkg/mgmt/restore/restore.go Outdated Show resolved Hide resolved
@@ -0,0 +1,246 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a doc.go to this folder that lists the various low-level cases that are supported and any limitations?

Few things that are not clear by going through the code easily are:

  • how the restore differentiate between zvol and zfs datasets?
  • how does the file-system information get passed on to the restore process?

For example:

The restore flow is as follows:

  • plugin creates a restore PV (is this correct?)
  • plugin create a ZFS restore CR to restore the data from remote backup into the restore PV?
  • restore controller (on node) keeps a a watch for new CRs associated with the node id.
  • if status != init or marked for deletion, restore controller will execute the zfs recv | create command.

Limitations:

  • If the restore fails due to network issues? Will it be re-attempted?
  • If the restore doesn't have the specified backup, will it marked as failed?
  • Similar to the need for nodeID being same, the expectation is also that capacity should be available on the dest node/pool?
  • What happens when the same volume is restored twice or if the dest pool already has a volume with the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Copy link
Member

@mynktl mynktl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes looks good.

go func() {
err := backup.Start(&ControllerMutex, stopCh)
if err != nil {
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message needs to be changed for backup and restart start function error handling.

klog.Infof("backup %s done %s@%s prevsnap [%s]", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName, bkp.Spec.PrevSnapName)
err = zfs.UpdateBkpInfo(bkp, apis.BKPZFSStatusDone)
} else {
klog.Errorf("backup %s failed %s@%s", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to log error here for debugging

func (c *BkpController) addBkp(obj interface{}) {
bkp, ok := obj.(*apis.ZFSBackup)
if !ok {
runtime.HandleError(fmt.Errorf("Couldn't get bkp object %#v", obj))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to use full form instead of bkp

if zfs.NodeID != bkp.Spec.OwnerNodeID {
return
}
klog.Infof("Got add event for Bkp %s snap %s@%s", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be move to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

klog I guess does not have debug. May be later we can add verbose to these logs.

}
} else {
// if status is init then it means we are creating the zfs backup.
if bkp.Status == apis.BKPZFSStatusInit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this condition backup resource needs to be created with init status. Should we add check here to execute backup if status is empty also? Or we need to mark empty status as invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else / if can be merged here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do the validation in the CR. working on adding validation schema for this.

// ZFSRestore should be deleted. Check if deletion timestamp is set
if !c.isDeletionCandidate(rstr) {
// if finalizer is not set then it means we are creating
// the zfs backup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs to be changed.

Here adding Backup and Restore controller, which will be watching for
the events. The velero plugin will create a Backup CR to create a backup
with the remote location information, the controller will send the data
to that remote location.

In the same way, the velero plugin will create a Restore CR to restore the
volume from the the remote location and the restore controller will restore
the data.

Steps to use velero plugin for ZFS-LocalPV are :

1. install velero

2. add openebs plugin

velero plugin add openebs/velero-plugin:latest

3. Create the volumesnapshot location :

for full backup :-

```yaml
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/zfspv-blockstore
  config:
    bucket: velero
    prefix: zfs
    namespace: openebs
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.velero.svc:9000
```

for incremental backup :-

```yaml
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/zfspv-blockstore
  config:
    bucket: velero
    prefix: zfs
    backup: incremental
    namespace: openebs
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.velero.svc:9000
```

4. Create backup

velero backup create my-backup --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-cloud-default --storage-location=default

5. Create Schedule

velero create schedule newschedule  --schedule="*/1 * * * *" --snapshot-volumes --include-namespaces=velero-ns --volume-snapshot-locations=aws-local-default --storage-location=default

6. Restore from backup

velero restore create --from-backup my-backup --restore-volumes=true --namespace-mappings velero-ns:ns1
Signed-off-by: Pawan <[email protected]>

- Backup controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource.

- The Backup controller will take a snapshot and then send the data to the remote location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one probably needs little clarity. remote location here refers to velero controller that receives the data and then pushes to S3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, updated it.


- It will save the namespace information where the pvc is created also while taking the backup. Plugin will use this info if restoring without a namespace mapping to find if volume has already been restored.

- plugin then creates the ZFSBackup CR with the destination volume and remote location where the data needs to be send.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to mention, status should be init


- Backup controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource.

- The Backup controller will take a snapshot and then send the data to the remote location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to remove send the data to remote location since it is part of next step and may create confusion.

Signed-off-by: Pawan <[email protected]>
Signed-off-by: Pawan <[email protected]>
@kmova kmova merged commit e40026c into openebs:master Sep 8, 2020
@pawanpraka1 pawanpraka1 deleted the backup branch September 8, 2020 08:53
@pawanpraka1 pawanpraka1 added feature and removed enhancement Add new functionality to existing feature labels Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants