Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero restored pods' schedule behaviors are different from Kubernetes scheduler #6945

Open
Lyndon-Li opened this issue Oct 12, 2023 · 4 comments
Assignees
Labels
Needs triage We need discussion to understand problem and decide the priority Restore

Comments

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Oct 12, 2023

During restore, Velero removes the objects' ownerReference field, including pod objects. However, pods' ownerReference makes remarkable affects to Kubernetes scheduler --- for statefulset/replicaset pods, the scheduler spreads the pods evenly across node as much as possible, this is done for statefulset/replicaset pods only.
If the pods's ownerReferences are removed, the scheduler has no way to identify the pods as part of statefulset/replicaset, as a result, the aforementioned strategies are not applied to the pods restored by Velero.

The consequence from users perspective is that when restoring a statefulset/replicaset/deployment, pods are probably scheduled to the same node instead of spreading evenly across nodes. While the spread of the pods heavily impacts the quality of HA.

@Lyndon-Li Lyndon-Li self-assigned this Oct 12, 2023
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

@cdtzabra
Copy link

This make an orphan pods not attached to replicasets

And for the statefullset app issue to attach a pvc to a right pods

@reasonerjt reasonerjt added the Needs triage We need discussion to understand problem and decide the priority label Mar 4, 2024
@reasonerjt
Copy link
Contributor

reasonerjt commented Mar 4, 2024

This make an orphan pods not attached to replicasets

I believe the controller should be able to reconcile and eventually the criteria of the replicasets are met?

And for the statefullset app issue to attach a pvc to a right pods

Would you mind opening another issue to elaborate on this issue? I don't think velero can be used as a deployer to make sure "redeploy" any application during restore out of the box, but if it's specific to statefulset, there may be something can be enhanced.

@cdtzabra
Copy link

cdtzabra commented Mar 4, 2024

Hi @reasonerjt

No, the controller doesn't do anything for FS restoration with Restic or Kopia (volume-to-fs-backup).

If you want PVC/PV data, you have to restore the pods. However, if, in the same restore, you keep the deployment/statefullset/x with or without the replicaset, you'll still have new pods that are spawned and therefore unable to use the pvcs, since they're still attached to the restored pods.

The only workaround in this case is to delete these restored pods (since they don't correspond to the replicaset and are therefore orphaned) to free up the pvcs.

I migrated from Velero/restic to velero/kopia but it is still the same behavior

As a result, I had to divide my restore(from cluster x to cluster y) into several steps

  1. restore only pods + pvc
  2. check that data is restored and delete pods
  3. restore the rest of the workload without pods

some related issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs triage We need discussion to understand problem and decide the priority Restore
Projects
None yet
Development

No branches or pull requests

3 participants