Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero can't take the snapshots of PVCs of ebs gp3 volumes that are provisioned by AWS CSI driver #7251

Closed
arunkumarrspl opened this issue Dec 26, 2023 · 1 comment
Assignees
Labels
Area/Cloud/AWS Area/CSI Related to Container Storage Interface support Needs info Waiting for information target/1.12.3

Comments

@arunkumarrspl
Copy link

What steps did you take and what happened:

I am trying to take the snapshots for PVCs which are provisioned by gp3 storage class.
The below storage class is created using the ebs csi driver

gp3 (default)             ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   509d

Below CRDs for Velero are installed

backuprepositories.velero.io                            2023-09-19T14:06:24Z
backups.velero.io                                       2023-09-19T14:06:24Z
backupstoragelocations.velero.io                        2023-09-19T14:06:25Z
datadownloads.velero.io                                 2023-12-26T09:57:56Z
datauploads.velero.io                                   2023-12-26T09:57:57Z
deletebackuprequests.velero.io                          2023-09-19T14:06:25Z
downloadrequests.velero.io                              2023-09-19T14:06:26Z
podvolumebackups.velero.io                              2023-09-19T14:06:26Z
podvolumerestores.velero.io                             2023-09-19T14:06:27Z
restores.velero.io                                      2023-09-19T14:06:28Z
schedules.velero.io                                     2023-09-19T14:06:28Z
serverstatusrequests.velero.io                          2023-09-19T14:06:29Z
volumesnapshotlocations.velero.io                       2023-09-19T14:06:29Z

Then I have installed the CRDs from https://raw.githubusercontent.com/kubernetes-sigs/aws-ebs-csi-driver/release-1.0/deploy/kubernetes/cluster/crd_snapshotter.yaml

Then I created volume snapshot class

NAME          DRIVER            DELETIONPOLICY   AGE
csi-aws-vsc   ebs.csi.aws.com   Delete           10d

The PVCs Velero is trying to take the snapshots.

redis-data-authserver-redis-master-0     Bound    pvc-83f80e61-0aad-4d19-a081-5a4c1faa7864   8Gi        RWO            gp3            347d
redis-data-authserver-redis-replicas-0   Bound    pvc-0a9aaa4a-2fb5-4b51-a97c-a88aaecd3a2d   8Gi        RWO            gp3            347d
redis-data-authserver-redis-replicas-1   Bound    pvc-8153779f-68b6-4b41-99d6-71212395a68a   8Gi        RWO            gp3            347d
redis-data-authserver-redis-replicas-2   Bound    pvc-6fe8b683-76ad-405f-b38d-49aa8fe54470   8Gi        RWO            gp3            347d

What did you expect to happen:
Velero should be able to take the snapshots of the Persistent volume claims.

The following information will help us better understand what's going on:
But velero backups are partially failed as they are not able to take the snapshots of the PVCs.

$velero backup get
velero-fullclusterbackup-20231226101049   PartiallyFailed   5        0          2023-12-26 15:40:50 +0530 IST   3d        default            <none>
velero-fullclusterbackup-20231226000048   PartiallyFailed   5        0          2023-12-26 05:30:48 +0530 IST   3d        default            <none>
velero-fullclusterbackup-20231225000048   PartiallyFailed   5        0          2023-12-25 05:30:48 +0530 IST   2d        default            <none>
$velero backup logs velero-fullclusterbackup-20231226101049 
{"backup":"velero/velero-fullclusterbackup-20231226101049","error.message":"error executing custom action (groupResource=volumesnapshots.snapshot.storage.k8s.io, namespace=dmz, name=velero-redis-data-authserver-redis-replicas-2-6fxkb): rpc error: code = Aborted desc = plugin panicked: runtime error: invalid memory address or nil pointer dereference, stack trace: goroutine 513 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/vmware-tanzu/velero/pkg/plugin/framework/common.HandlePanic({0x1ec70a0, 0x373f230})\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/plugin/framework/common/handle_panic.go:43 +0x91\ngithub.com/vmware-tanzu/velero/pkg/plugin/framework/backupitemaction/v2.(*BackupItemActionGRPCServer).Execute.func1()\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/plugin/framework/backupitemaction/v2/backup_item_action_server.go:90 +0x2c\npanic({0x1ec70a0, 0x373f230})\n\t/usr/local/go/src/runtime/panic.go:884 +0x213\ngithub.com/vmware-tanzu/velero-plugin-for-csi/internal/util.GetVolumeSnapshotContentForVolumeSnapshot(0xc0005feb40, {0x26c12f0?, 0xc0009be4b0}, {0x26d87c0, 0xc000416000}, 0x40?, 0x8bb2c97000)\n\t/go/src/velero-plugin-for-csi/internal/util/util.go:269 +0x298\ngithub.com/vmware-tanzu/velero-plugin-for-csi/internal/backup.(*VolumeSnapshotBackupItemAction).Execute(0xc0003880b0, {0x26c76a0, 0xc0005a8258}, 0xc00074a000)\n\t/go/src/velero-plugin-for-csi/internal/backup/volumesnapshot_action.go:98 +0x48c\ngithub.com/vmware-tanzu/velero/pkg/plugin/framework/backupitemaction/v2.(*BackupItemActionGRPCServer).Execute(0x20423c0?, {0xc000a19960?, 0x513786?}, 0xc000a19960)\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/plugin/framework/backupitemaction/v2/backup_item_action_server.go:110 +0x366\ngithub.com/vmware-tanzu/velero/pkg/plugin/generated/backupitemaction/v2._BackupItemAction_Execute_Handler({0x1fbe360?, 0xc0000134a8}, {0x26c1018, 0xc0008d4990}, 0xc000a198f0, 0x0)\n\t/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/plugin/generated/backupitemaction/v2/BackupItemAction.pb.go:794 +0x170\ngoogle.golang.org/grpc.(*Server).processUnaryRPC(0xc0008b03c0, {0x26ca358, 0xc000103040}, 0xc00071cc60, 0xc0008a4b70, 0x374ed18, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1345 +0xdf3\ngoogle.golang.org/grpc.(*Server).handleStream(0xc0008b03c0, {0x26ca358, 0xc000103040}, 0xc00071cc60, 0x0)\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:1722 +0xa36\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2()\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:966 +0x98\ncreated by google.golang.org/grpc.(*Server).serveStreams.func1\n\t/go/pkg/mod/google.golang.org/[email protected]/server.go:964 +0x28a\n","level":"error","logSource":"pkg/backup/backup.go:448","msg":"Error backing up item","name":"authserver-redis-replicas-2","time":"2023-12-26T10:50:58Z"}

Environment:

  • Velero version (use velero version): v1.11.1
  • Velero features (use velero client config get features): features:
  • Kubernetes version (use kubectl version): 1.26
  • Kubernetes installer & version: EKS (v1.26.11-eks-8cb36c9)
  • Cloud provider or hardware configuration: aws EKS
  • OS (e.g. from /etc/os-release): Amazon Linux 2

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@arunkumarrspl arunkumarrspl changed the title velero csi plugin can't take the snapshots of PVCs of ebs gp3 volumes that are provisioned by Velero can't take the snapshots of PVCs of ebs gp3 volumes that are provisioned by AWS CSI driver Dec 26, 2023
@ywk253100 ywk253100 added Area/CSI Related to Container Storage Interface support Area/Cloud/AWS labels Dec 27, 2023
@ywk253100
Copy link
Contributor

Seems it's the same issue fixed by vmware-tanzu/velero-plugin-for-csi#215 which will be available in the next release.

Did you set csiSnapshotTimeout? Could you try to increase its value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Cloud/AWS Area/CSI Related to Container Storage Interface support Needs info Waiting for information target/1.12.3
Projects
None yet
Development

No branches or pull requests

3 participants