Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow velero install to specify tolerations for daemonset #2898

Open
jbmassicotte opened this issue Sep 2, 2020 · 6 comments
Open

Allow velero install to specify tolerations for daemonset #2898

jbmassicotte opened this issue Sep 2, 2020 · 6 comments
Assignees
Labels
Icebox We see the value, but it is not slated for the next couple releases. Kopia Needs Product Blocked needing input or feedback from Product Restic Relates to the restic integration Reviewed Q2 2021

Comments

@jbmassicotte
Copy link

jbmassicotte commented Sep 2, 2020

Edited this earlier post of mine given the more recent info I’ve gathered.

What I did

  • Installed velero client 1.4.2
  • Created a credential file, following instructions from velero-plugin-for-azure
  • Deployed velero server in Azure cluster using:
CREDENTIAL_FILE=${HOME}/credentials-velero
BLOB_CONTAINER=dcieastus2dev03cont
AZURE_BACKUP_RESOURCE_GROUP=dci-eastus2-dev-gen-rg-03
AZURE_STORAGE_ACCOUNT=dcieastus2dev03st
API_TIMEOUT=5m

velero install \
    --provider azure \
    --plugins velero/velero-plugin-for-microsoft-azure:v1.1.0 \
    --bucket $BLOB_CONTAINER \
    --secret-file $CREDENTIAL_FILE \
    --backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT \
    --snapshot-location-config apiTimeout=$API_TIMEOUT,resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP \
    --use-restic
  • I am using restic because my app mounts an AzureFile volume (it also mounts 3 ManagedDisk volumes but these are supported natively by velero)
  • I added the AzureFile volume name to the app pod annotation, as required by restic (backup.velero.io/backup-volumes: <volumename>)
  • I also added the mountOptions nouser_xattr to the AzureFile storageclass, again, as required by restic
  • Attempted to create a backup: velero backup create backup1 --include-namespaces mynamespace

The problem

  • velero backup describe backup1 –details shows process stuck InProgress, no error, no warning. See attached file.
  • last log from kubectl logs deployment/velero -n velero says 'Initializing restic repository'

What did you expect to happen:
The backup to complete

Anything else you would like to add:

  • I can see in Azure Portal that velero created a folder called restic under the Azure container, so I know the container location is valid
  • I tried removing the AzureFile volume name from the pod annotation, restarted velero, with the use-restic flag still on, and the backup succeeded this time, which points to restic as the culprit.
    BUT: I also tried removing the use-restic flag (checked that the restic daemonset was not started), added the pod annotation back, and check that: the backup failed with the same "Initializing restic repo" condition. What's up with that!?
  • I am starting to believe this is a bug, so please prove me wrong

Environment:

$ kubectl version --short
Client Version: v1.15.10
Server Version: v1.17.9
$ velero client config get features
features: <NOT SET>
$ velero version
Client:
        Version: v1.4.2
        Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
Server:
        Version: v1.4.2

create-backup.txt

@jbmassicotte jbmassicotte changed the title Backup stuck InProgress; error getting backup resource list: timed out waiting for download URL Backup stuck InProgress, using restic with AzureFile volume Sep 11, 2020
@jbmassicotte
Copy link
Author

We figured out our problem: our cluster is composed of 3 nodepools, the default, plus let’s say pool A and B. We have 2 applications, let’s say X and Y, and use ‘tolerations’ to force app X on nodepool A, and app Y on nodepool B. Because restic uses no toleration, it runs on default nodepool and fails to backup volumes from applications running on pool A and B.

To fix the problem (temporarily), I used kubectl edit daemonset/restic -n velero to add the needed toleration, which forced restic to run on all cluster nodes. Subsequent backups worked.

Questions to the Velero experts: I need to make these changes permanent. How can I provide these changes to the ‘velero install’ command? Is there a way to provide a daemonset-restic.yaml file to ‘velero install’, and if so, where can I find the default file which I will use to add the toleration config?

@jbmassicotte
Copy link
Author

I ended up writing a script to capture the daemonset yaml config, to add the toleration to this config via a sequence of sed updates, and to invoke ‘kubectl replace’ with the updated config. It does the trick but I find that somewhat cheesy. Any solution deemed more elegant and reliable would be appreciated.

@zubron zubron changed the title Backup stuck InProgress, using restic with AzureFile volume Allow velero install to specify tolerations for restic daemonset Oct 21, 2020
@zubron zubron added the Needs Product Blocked needing input or feedback from Product label Oct 21, 2020
@JarnoRFB
Copy link

@jbmassicotte In case you can use the velero helm chart instead, it is possible to specify tolerations for the daemonset there https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml#L269.

As this tripped me off a bit, when trying to do a restic backup on a pod that was running on a node where no restic daemon was running I think it would be good behavior if the backup would raise an error or at least show a warning in the velero logs in this situation.

@eleanor-millman eleanor-millman added Reviewed Q2 2021 Icebox We see the value, but it is not slated for the next couple releases. labels May 11, 2021
@stale
Copy link

stale bot commented Jul 10, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the staled label Jul 10, 2021
@JarnoRFB
Copy link

Should this be unstaled, as it already has been marked as valuable?

@stale stale bot removed the staled label Jul 12, 2021
@zubron zubron added the Restic Relates to the restic integration label Jul 12, 2021
@arunvc
Copy link

arunvc commented Apr 2, 2022

Backup stuck using restic with out any clue, InProgress status.
velero install has no option for tolerations.

Thanks to @jbmassicotte
Manual editing works

kubectl edit daemonset/restic -n velero

Eg

      tolerations:
      - key: cpu
        operator: Equal
        value: mydb
        effect: NoSchedule

@kaovilai kaovilai changed the title Allow velero install to specify tolerations for restic daemonset Allow velero install to specify tolerations for daemonset Nov 26, 2024
@kaovilai kaovilai self-assigned this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Icebox We see the value, but it is not slated for the next couple releases. Kopia Needs Product Blocked needing input or feedback from Product Restic Relates to the restic integration Reviewed Q2 2021
Projects
None yet
Development

No branches or pull requests

6 participants