Add possibility to change or map StorageClass during backup using CSI Snapshots and DataMover #7700

fgleixner · 2024-04-17T18:31:26Z

Describe the problem/challenge you have
We use longhorn and we have different StorageClasses for longhorn defined. Some for SSD/NVMe Disks, some on rotating rust, some with 1, 2 or even 3 Replicas. For different Workloads.
When we do backups, we noticed, that the PVC generated from the CSI snapshot uses the same StorageClass as the original volume.
So a PVC and a PV is created only for backup purposes and it inherits the settings of the original PV which may be expensive NVMe storage with 3 replicas. This may cause the backup to fail, because the amount of this specific storage may not be available.

Describe the solution you'd like
I'd like to see a possibility to map Storageclasses the same way it is possible for restores also during snapshot data movement.
https://velero.io/docs/v1.13/restore-reference/#changing-pvpvc-storage-classes

Anything else you would like to add:

Environment:

Velero version (use velero version): 1.12.1
Kubernetes version (use kubectl version): 1.28
Kubernetes installer & version: kubesprax 2.24.1
Cloud provider or hardware configuration: on premises, Longhorn for storage
OS (e.g. from /etc/os-release): SLES 15

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "The project would be better with this feature added"
👎 for "This feature will not enhance the project in a meaningful way"

The text was updated successfully, but these errors were encountered:

sseago · 2024-04-17T22:13:48Z

To make sure I understand this, you're saying that the storageclass mapping on backup here would only be used for creating the temporary cloned-via-snapshot PVC that velero uses for copying data to object storage, but the storageclass stored in the backup on the PVC to be restored is unchanged? Of course if the storageclass you want to use for the temporary PVC used for data movement is the same as the one you want to restore to, you'd just use the same mapping when creating backup and restore.

Or did you have something different in mind?

fgleixner · 2024-04-18T06:47:41Z

Exactly. The mapping structure in the config map could be the same and re-used, but for different purposes. This should not change anything else but only the StorageClass of the temporary PVC for copying data to object storage.

Rohmilchkaese · 2024-05-02T12:27:57Z

Thanks for opening @fgleixner we discovered the same thing! Also with longhorn as storage backend

gh-tek · 2024-06-18T09:36:11Z

I have this problem too with longhorn. Temporary PVC for data upload creates huge disk I/O load because it creates replicated PV with data locality requirement. I created another storage class for data upload, but then realized that there is no place to configure it. Velero data upload seems to always use whatever storage class that original pv is.

ehemmerlin · 2024-06-21T07:09:30Z

We are facing the same issue: the backup uses a lot of disk space, as Longhorn replicates the data of each snapshot the same way it does for all of our Kubernetes volumes. Moreover it retains these volumes, as it's the default reclaim policy of our storage class, so volumes created during backups are never deleted even if the backup has expired and no longer exists.

We are looking for a way to remove the volumes created by the snapshots after the backup has expired so specifying a different storage class for Velero's backup (we could create one with a Delete reclaim policy instead of a Retain one) would solve this issue: volumes from expired backups would be deleted.

Link to the original issue: #6192

Being able to change StorageClass during backup using CSI Snapshots and DataMover would allow us to set reclaimPolicy at delete and numberOfReplicas at 1, which would fix the entire issue we face.

larssb · 2024-07-09T14:11:37Z

I have this problem too with longhorn. Temporary PVC for data upload creates huge disk I/O load because it creates replicated PV with data locality requirement. I created another storage class for data upload, but then realized that there is no place to configure it. Velero data upload seems to always use whatever storage class that original pv is.

I think exactly this may be the cause of us seeing a worker node going down/getting into a Zombie state over the weekend.

I upgraded Velero to v1.14
CSI Data Mover is enabled

Not long after we saw worker nodes get into NotReady ... because of high CPU usage on these. First on one worker, it recovered, and then another.

Then from Friday to Saturday another worker went haywire with a huge spike in CPU. Finally for this node to get into a total zombie state ... Pods in terminating state for a prolonged time ... not recovering after the CPU and MEM usage on the worker oozed down.

I think this issue is pretty important and thank you for looking at it.

Have a great day.

Borrelhapje · 2024-08-27T08:13:42Z

I think this can be closed as this feature will be released with the next minor release?

Lyndon-Li · 2024-08-27T08:18:39Z

Fixed by #7982 and #8109

blackpiglet added area/datamover feature Request for a new feature labels Apr 19, 2024

reasonerjt added kind/requirement backlog labels Apr 19, 2024

reasonerjt assigned Lyndon-Li Apr 22, 2024

reasonerjt removed the feature Request for a new feature label Apr 22, 2024

Borrelhapje mentioned this issue Apr 27, 2024

Allow setting the /spec/accessModes of a PVC created for the CSI snapshot data movement #7747

Closed

draghuram mentioned this issue May 13, 2024

Velero CSI volume snapshot - Timeout issues aws plugin #7742

Open

Lyndon-Li added the 1.15-candidate label May 27, 2024

Lyndon-Li mentioned this issue Jun 13, 2024

Make the storage class of the datamover PV configurable #7882

Closed

reasonerjt added Needs Design area/storage/longhorn 2024 Q2 reviewed 1.15-candidate and removed 1.15-candidate labels Jun 14, 2024

Lyndon-Li mentioned this issue Jul 5, 2024

Add the design for backup PVC configurations #7982

Merged

reasonerjt removed the 1.15-candidate label Jul 24, 2024

This was referenced Aug 12, 2024

Add support for backup PVC configuration #8109

Merged

Add docs for backup pvc config support #8119

Merged

Lyndon-Li closed this as completed Aug 27, 2024

Lyndon-Li reopened this Aug 27, 2024

Lyndon-Li added this to the v1.15 milestone Aug 27, 2024

github-project-automation bot added this to OADP Aug 27, 2024

Lyndon-Li closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility to change or map StorageClass during backup using CSI Snapshots and DataMover #7700

Add possibility to change or map StorageClass during backup using CSI Snapshots and DataMover #7700

fgleixner commented Apr 17, 2024

sseago commented Apr 17, 2024

fgleixner commented Apr 18, 2024

Rohmilchkaese commented May 2, 2024

gh-tek commented Jun 18, 2024

ehemmerlin commented Jun 21, 2024 •

edited

Loading

larssb commented Jul 9, 2024

Borrelhapje commented Aug 27, 2024

Lyndon-Li commented Aug 27, 2024

Add possibility to change or map StorageClass during backup using CSI Snapshots and DataMover #7700

Add possibility to change or map StorageClass during backup using CSI Snapshots and DataMover #7700

Comments

fgleixner commented Apr 17, 2024

sseago commented Apr 17, 2024

fgleixner commented Apr 18, 2024

Rohmilchkaese commented May 2, 2024

gh-tek commented Jun 18, 2024

ehemmerlin commented Jun 21, 2024 • edited Loading

larssb commented Jul 9, 2024

Borrelhapje commented Aug 27, 2024

Lyndon-Li commented Aug 27, 2024

ehemmerlin commented Jun 21, 2024 •

edited

Loading