Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zfspv): adding backup and restore support #162

Merged
merged 4 commits into from
Sep 8, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pkg/driver/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,15 @@ func NewNode(d *CSIDriver) csi.NodeServer {
go func() {
err := backup.Start(&ControllerMutex, stopCh)
if err != nil {
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error())
klog.Fatalf("Failed to start ZFS backup management controller: %s", err.Error())
}
}()

// start the restore controller
go func() {
err := restore.Start(&ControllerMutex, stopCh)
if err != nil {
klog.Fatalf("Failed to start ZFS volume snapshot management controller: %s", err.Error())
klog.Fatalf("Failed to start ZFS restore management controller: %s", err.Error())
}
}()

Expand Down
4 changes: 2 additions & 2 deletions pkg/mgmt/backup/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ func (c *BkpController) syncBkp(bkp *apis.ZFSBackup) error {
klog.Infof("backup %s done %s@%s prevsnap [%s]", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName, bkp.Spec.PrevSnapName)
err = zfs.UpdateBkpInfo(bkp, apis.BKPZFSStatusDone)
} else {
klog.Errorf("backup %s failed %s@%s", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName)
klog.Errorf("backup %s failed %s@%s err %v", bkp.Name, bkp.Spec.VolumeName, bkp.Spec.SnapName, err)
err = zfs.UpdateBkpInfo(bkp, apis.BKPZFSStatusFailed)
}
}
Expand All @@ -102,7 +102,7 @@ func (c *BkpController) syncBkp(bkp *apis.ZFSBackup) error {
func (c *BkpController) addBkp(obj interface{}) {
bkp, ok := obj.(*apis.ZFSBackup)
if !ok {
runtime.HandleError(fmt.Errorf("Couldn't get bkp object %#v", obj))
runtime.HandleError(fmt.Errorf("Couldn't get backup object %#v", obj))
return
}

Expand Down
47 changes: 47 additions & 0 deletions pkg/mgmt/backup/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
Copyright 2020 The OpenEBS Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

/*
The Backup flow is as follows:

- plugin takes the backup of ZFSVolume CR so that it can be restored.

- It will save the namespace information where the pvc is created also while taking the backup. Plugin will use this info if restoring without a namespace mapping to find if volume has already been restored.

- plugin then creates the ZFSBackup CR with the destination volume and remote location where the data needs to be send.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to mention, status should be init


- Backup controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource.

- The Backup controller will take a snapshot and then send the data to the remote location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one probably needs little clarity. remote location here refers to velero controller that receives the data and then pushes to S3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, updated it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to remove send the data to remote location since it is part of next step and may create confusion.


- if Backup status == init and not marked for deletion, Backup controller will execute the `zfs send | remote-write` command.

- If Backup is deleted then corresponsing snapshot also gets deleted.


Limitation :-

- there should be enough space in the pool to accomodate the snapshot.

- if there is a network error and backup failed and :
* Backup status update also failed, then backup will be retried from the beginning (TODO optimize it)
* Backup status update is successful, the Backup operation will fail.

- A snapshot will exist as long as Backup is be present and it will be cleaned up when the Backup is deleted.

*/

package backup
48 changes: 48 additions & 0 deletions pkg/mgmt/restore/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
Copyright 2020 The OpenEBS Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

/*
The restore flow is as follows:
- plugin creates a restore storage volume(zvol or dataset)

At the backup time, the plugin backs up the ZFSVolume CR and at while doing the restore we have all the information related to that volume. The plugin first creates the restore destination to store the data.

- plugin then creates the ZFSRestore CR with the destination volume and remote location from where the data needs to be read

- restore controller (on node) keeps a watch for new CRs associated with the node id. This node ID will be same as the Node ID present in the ZFSVolume resource.

- if Restore status == init and not marked for deletion, Restore controller will execute the `remote-read | zfs recv` command.


Limitation with the Initial Version :-

- The destination cluster should have same node ID and Zpool present.

- If volume was thick provisioned, then destination Zpool should have enough space for that volume.

- destination volume should be present before starting the Restore Operation.

- If the restore fails due to network issues and
* the status update succeed, the Restore will not be re-attempted.
* the status update fails, the Restore will be re-attempted from the beginning (TODO optimize it).

- If the restore doesn't have the specified backup, the plugin itself fails that restore request as there is no Backup to Restore from.

- If the same volume is restored twice, the data will be written again. The plugin should fail this kind of request.

*/

package restore
5 changes: 2 additions & 3 deletions pkg/mgmt/restore/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,9 @@ func (c *RstrController) enqueueRestore(obj interface{}) {
// ZFSRestore
func (c *RstrController) syncRestore(rstr *apis.ZFSRestore) error {
var err error = nil
// ZFSRestore should be deleted. Check if deletion timestamp is set
// ZFSRestore should not be deleted. Check if deletion timestamp is set
if !c.isDeletionCandidate(rstr) {
// if finalizer is not set then it means we are creating
// the zfs backup.
// if status is Init, then only do the restore
if rstr.Status == apis.RSTZFSStatusInit {
err = zfs.CreateRestore(rstr)
if err == nil {
Expand Down
2 changes: 1 addition & 1 deletion pkg/mgmt/restore/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ var (
kubeconfig string
)

// Start starts the zfsbackup controller.
// Start starts the zfsrestore controller.
func Start(controllerMtx *sync.RWMutex, stopCh <-chan struct{}) error {

// Get in cluster config
Expand Down
2 changes: 1 addition & 1 deletion pkg/zfs/volume.go
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ func UpdateBkpInfo(bkp *apis.ZFSBackup, status apis.ZFSBackupStatus) error {
newBkp.Status = status

if err != nil {
klog.Errorf("Update snapshot failed %s err: %s", bkp.Spec.VolumeName, err.Error())
klog.Errorf("Update backup failed %s err: %s", bkp.Spec.VolumeName, err.Error())
return err
}

Expand Down