Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velero restic backup timeout #6450

Closed
ugur99 opened this issue Jul 3, 2023 · 8 comments
Closed

velero restic backup timeout #6450

ugur99 opened this issue Jul 3, 2023 · 8 comments
Assignees

Comments

@ugur99
Copy link

ugur99 commented Jul 3, 2023

What steps did you take and what happened:
Velero Restic podVolumeBackups has started to get stuck in the first attempt and it can not proceed for other volumes.

What did you expect to happen:
Even though it hangs for some podvolumebackups I believe that after some timeout for per volume it should be able to continue with the next volume.

The following information will help us better understand what's going on:

# backup logs
...
name: /dashboard-oauth-proxy-c64f74f59-b89sm error: /timed out waiting for all PodVolumeBackups to complete

time="2023-07-03T02:00:22Z" level=error msg="Error backing up item" backup=velero/backup-velero-full-backup-daily-20230702220021 error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:243" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:435" name=dashboard-oauth-proxy-c64f74f59-b89sm

...

# backup describe 
...
restic Backups:
  New:
    argo-rollouts/dashboard-oauth-proxy-c64f74f59-b89sm: some-env
...

#podvolumebackup:
...
Status:
  Progress:
Events:  <none>

Anything else you would like to add:
I know that it is best practice to ignore some dummy volumes to make Velero Restic Backup work more efficiently, but since we cannot do this by volume type, we are currently looking for an easy way to do this for the entire cluster. I mean, if there is a way to tell Restic to only back up pvc-type volumes, that would be great.

Environment:

  • Velero version (use velero version):
Client:
	Version: v1.11.0
	Git commit: -
Server:
	Version: v1.11.0
  • Velero features (use velero client config get features):
features: <NOT SET>
  • Kubernetes version (use kubectl version):
Server Version: v1.24.10
  • Kubernetes installer & version:
kubespray 
@sseago
Copy link
Collaborator

sseago commented Jul 3, 2023

The current timeout setting is for the whole backup -- i.e. if a pod volume backup is not completed 4 hours since backup started, then time out (which means subsequent ones will time out too) -- the intent of this timeout is "choose a time that no backup should take longer than". If you had 100 volumes and each was taking over an hour, then your backup would take days. It may be that there would be some value in an additional "time out an individual volume if this volume takes longer than x minutes", but that should probably be an enhancement to add it on top of the current timeout, not instead of.

@ugur99
Copy link
Author

ugur99 commented Jul 3, 2023

The current timeout setting is for the whole backup -- i.e. if a pod volume backup is not completed 4 hours since backup started, then time out (which means subsequent ones will time out too) -- the intent of this timeout is "choose a time that no backup should take longer than". If you had 100 volumes and each was taking over an hour, then your backup would take days. It may be that there would be some value in an additional "time out an individual volume if this volume takes longer than x minutes", but that should probably be an enhancement to add it on top of the current timeout, not instead of.

Hmm yes that makes sense @sseago; thank you for the explanation.

@Lyndon-Li
Copy link
Contributor

@ugur99
Could you share some more information about dummy volumes, why the backup for dummy volumes is always stuck?

@ugur99
Copy link
Author

ugur99 commented Jul 10, 2023

Thanks for the support @Lyndon-Li, due to confidentiality I'm afraid I can't share the full Velero debug output; but I'd be happy to share specific logs/manifests, whatever you need to troubleshoot.

But to be honest, I've shared the related logs/descriptions in the issue; there are no any useful info even in debug mode.

Time to time we are observing that restic stucks when backing up volumes like configmap or emptyDir. And no idea what is going on.

@Lyndon-Li
Copy link
Contributor

@ugur99
I think both configMap volume and emptyDir volume is not backed up by file system backup necessarily:

  • There is a direct way to backup the configMap itself using Velero
  • The data in emptyDir is believed to be ephemeral, there is no value to protect them

Therefore, the best practice is to exclude them from backup. Let's see how we can filter the volumes by their type, as an future enhancement of Velero's filter system.

@Lyndon-Li Lyndon-Li added Area/Filters 1.13-candidate issue/pr that should be considered to target v1.13 minor release labels Jul 11, 2023
@Lyndon-Li Lyndon-Li removed 1.13-candidate issue/pr that should be considered to target v1.13 minor release Area/Filters labels Jul 11, 2023
@Lyndon-Li
Copy link
Contributor

FYI, I've opened a separate issue #6482 for the filter enhancement.

@ugur99
Copy link
Author

ugur99 commented Jul 16, 2023

Thank you @Lyndon-Li!

@Lyndon-Li
Copy link
Contributor

Closing as there is no further request for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants