Velero restore completes with warning while restoring services and endpoints backup #6280

pavansokkenagaraj · 2023-05-17T14:00:06Z

What steps did you take and what happened:

In velero v1.10.3 and v1.11.0
Restoring a backup with deployment, service and endpoints completes with warning

time="2023-05-17T13:48:49Z" level=warning msg="Namespace test, resource restore warning: could not restore, Endpoints \"example-nginx\" already exists. Warning: the in-cluster version is different than the backed-up version." logSource="pkg/controller/restore_controller.go:509" restore=velero/qakotsregression-g5mw7

What did you expect to happen:
Velero restore completes without warning

The following information will help us better understand what's going on:
In velero v1.11.0 and v1.10.3, services Resoruce was added to HighPriorities which makes the services to be restored first and then endpoints. So when endpoints are restored, there is a conflict due to Subsets.Address change from new Service's Endpoint.
Corresponding code:

velero/pkg/cmd/server/server.go

Lines 520 to 549 in 80cc81b

    
           var defaultRestorePriorities = restore.Priorities{ 
        
           	HighPriorities: []string{ 
        
           		"customresourcedefinitions", 
        
           		"namespaces", 
        
           		"storageclasses", 
        
           		"volumesnapshotclass.snapshot.storage.k8s.io", 
        
           		"volumesnapshotcontents.snapshot.storage.k8s.io", 
        
           		"volumesnapshots.snapshot.storage.k8s.io", 
        
           		"persistentvolumes", 
        
           		"persistentvolumeclaims", 
        
           		"serviceaccounts", 
        
           		"secrets", 
        
           		"configmaps", 
        
           		"limitranges", 
        
           		"pods", 
        
           		// we fully qualify replicasets.apps because prior to Kubernetes 1.16, replicasets also 
        
           		// existed in the extensions API group, but we back up replicasets from "apps" so we want 
        
           		// to ensure that we prioritize restoring from "apps" too, since this is how they're stored 
        
           		// in the backup. 
        
           		"replicasets.apps", 
        
           		"clusterclasses.cluster.x-k8s.io", 
        
           		"services", 
        
           	}, 
        
           	LowPriorities: []string{ 
        
           		"clusterbootstraps.run.tanzu.vmware.com", 
        
           		"clusters.cluster.x-k8s.io", 
        
           		"clusterresourcesets.addons.cluster.x-k8s.io", 
        
           	}, 
        
           }

velero/pkg/cmd/server/server.go

Lines 525 to 554 in 0da2baa

    
           var defaultRestorePriorities = restore.Priorities{ 
        
           	HighPriorities: []string{ 
        
           		"customresourcedefinitions", 
        
           		"namespaces", 
        
           		"storageclasses", 
        
           		"volumesnapshotclass.snapshot.storage.k8s.io", 
        
           		"volumesnapshotcontents.snapshot.storage.k8s.io", 
        
           		"volumesnapshots.snapshot.storage.k8s.io", 
        
           		"persistentvolumes", 
        
           		"persistentvolumeclaims", 
        
           		"serviceaccounts", 
        
           		"secrets", 
        
           		"configmaps", 
        
           		"limitranges", 
        
           		"pods", 
        
           		// we fully qualify replicasets.apps because prior to Kubernetes 1.16, replicasets also 
        
           		// existed in the extensions API group, but we back up replicasets from "apps" so we want 
        
           		// to ensure that we prioritize restoring from "apps" too, since this is how they're stored 
        
           		// in the backup. 
        
           		"replicasets.apps", 
        
           		"clusterclasses.cluster.x-k8s.io", 
        
           		"services", 
        
           	}, 
        
           	LowPriorities: []string{ 
        
           		"clusterbootstraps.run.tanzu.vmware.com", 
        
           		"clusters.cluster.x-k8s.io", 
        
           		"clusterresourcesets.addons.cluster.x-k8s.io", 
        
           	}, 
        
           }

Whereas in velero v1.10.2, the restore completed without warnings as the Services and Endpoints were restored in alphabetical sorted order as they were restored after the HighPriority Resources.

velero/pkg/cmd/server/server.go

Lines 518 to 546 in 7416504

    
           var defaultRestorePriorities = restore.Priorities{ 
        
           	HighPriorities: []string{ 
        
           		"customresourcedefinitions", 
        
           		"namespaces", 
        
           		"storageclasses", 
        
           		"volumesnapshotclass.snapshot.storage.k8s.io", 
        
           		"volumesnapshotcontents.snapshot.storage.k8s.io", 
        
           		"volumesnapshots.snapshot.storage.k8s.io", 
        
           		"persistentvolumes", 
        
           		"persistentvolumeclaims", 
        
           		"serviceaccounts", 
        
           		"secrets", 
        
           		"configmaps", 
        
           		"limitranges", 
        
           		"pods", 
        
           		// we fully qualify replicasets.apps because prior to Kubernetes 1.16, replicasets also 
        
           		// existed in the extensions API group, but we back up replicasets from "apps" so we want 
        
           		// to ensure that we prioritize restoring from "apps" too, since this is how they're stored 
        
           		// in the backup. 
        
           		"replicasets.apps", 
        
           		"clusterclasses.cluster.x-k8s.io", 
        
           	}, 
        
           	LowPriorities: []string{ 
        
           		"clusterbootstraps.run.tanzu.vmware.com", 
        
           		"clusters.cluster.x-k8s.io", 
        
           		"clusterresourcesets.addons.cluster.x-k8s.io", 
        
           	}, 
        
           }

(edited)

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero describe restore qakotsregression-g5mw7 --details

$ velero describe restore qakotsregression-g5mw7 --details
Name:         qakotsregression-g5mw7
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  9
Items restored:              9

Started:    2023-05-17 13:48:48 +0000 UTC
Completed:  2023-05-17 13:48:49 +0000 UTC

Warnings:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    test:  could not restore, Endpoints "example-nginx" already exists. Warning: the in-cluster version is different than the backed-up version.

Backup:  qakotsregression-g5mw7

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
  Cluster-scoped:  included

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  true

Existing Resource Policy:   <none>
ItemOperationTimeout:       1h0m0s

Preserve Service NodePorts:  auto

Resource List:
  apps/v1/Deployment:
    - test/example-nginx(created)
  apps/v1/ReplicaSet:
    - test/example-nginx-85b445fb65(created)
  discovery.k8s.io/v1/EndpointSlice:
    - test/example-nginx-gbvm2(created)
  v1/ConfigMap:
    - test/example-config(created)
  v1/Endpoints:
    - test/example-nginx(failed)
  v1/Pod:
    - test/example-nginx-85b445fb65-n5b2v(created)
  v1/Secret:
    - test/kotsadm-replicated-registry(created)
    - test/qakotsregression-registry(created)
  v1/Service:
    - test/example-nginx(created)

velero restore logs qakotsregression-g5mw7

...
...

time="2023-05-17T13:48:49Z" level=info msg="Getting client for /v1, Kind=Service" logSource="pkg/restore/restore.go:918" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="restore status includes excludes: <nil>" logSource="pkg/restore/restore.go:1189" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Executing item action for services" logSource="pkg/restore/restore.go:1196" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Attempting to restore Service: example-nginx" logSource="pkg/restore/restore.go:1337" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="the managed fields for test/example-nginx is patched" logSource="pkg/restore/restore.go:1522" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Restored 6 items out of an estimated total of 9 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:669" name=example-nginx namespace=test progress= resource=services restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Getting client for apps/v1, Kind=Deployment" logSource="pkg/restore/restore.go:918" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="restore status includes excludes: <nil>" logSource="pkg/restore/restore.go:1189" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Executing item action for deployments.apps" logSource="pkg/restore/restore.go:1196" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Executing ChangeImageNameAction" cmd=/velero logSource="pkg/restore/change_image_name_action.go:68" pluginName=velero restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Done executing ChangeImageNameAction" cmd=/velero logSource="pkg/restore/change_image_name_action.go:81" pluginName=velero restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Attempting to restore Deployment: example-nginx" logSource="pkg/restore/restore.go:1337" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="the managed fields for test/example-nginx is patched" logSource="pkg/restore/restore.go:1522" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Restored 7 items out of an estimated total of 9 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:669" name=example-nginx namespace=test progress= resource=deployments.apps restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Getting client for /v1, Kind=Endpoints" logSource="pkg/restore/restore.go:918" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="restore status includes excludes: <nil>" logSource="pkg/restore/restore.go:1189" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Attempting to restore Endpoints: example-nginx" logSource="pkg/restore/restore.go:1337" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Restored 8 items out of an estimated total of 9 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:669" name=example-nginx namespace=test progress= resource=endpoints restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Getting client for discovery.k8s.io/v1, Kind=EndpointSlice" logSource="pkg/restore/restore.go:918" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="restore status includes excludes: <nil>" logSource="pkg/restore/restore.go:1189" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Attempting to restore EndpointSlice: example-nginx-gbvm2" logSource="pkg/restore/restore.go:1337" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="the managed fields for test/example-nginx-gbvm2 is patched" logSource="pkg/restore/restore.go:1522" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Restored 9 items out of an estimated total of 9 (estimate will change throughout the restore)" logSource="pkg/restore/restore.go:669" name=example-nginx-gbvm2 namespace=test progress= resource=endpointslices.discovery.k8s.io restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Waiting for all pod volume restores to complete" logSource="pkg/restore/restore.go:551" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Done waiting for all pod volume restores to complete" logSource="pkg/restore/restore.go:567" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Waiting for all post-restore-exec hooks to complete" logSource="pkg/restore/restore.go:571" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="Done waiting for all post-restore exec hooks to complete" logSource="pkg/restore/restore.go:579" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=warning msg="Namespace test, resource restore warning: could not restore, Endpoints \"example-nginx\" already exists. Warning: the in-cluster version is different than the backed-up version." logSource="pkg/controller/restore_controller.go:509" restore=velero/qakotsregression-g5mw7
time="2023-05-17T13:48:49Z" level=info msg="restore completed" logSource="pkg/controller/restore_controller.go:512" restore=velero/qakotsregression-g5mw7

Anything else you would like to add:

Environment:

Velero version (use velero version): v1.11.0 and v1.10.3
Velero features (use velero client config get features):
Kubernetes version (use kubectl version): v1.27
Kubernetes installer & version:
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

sseago · 2023-05-17T14:24:43Z

That warning just means that the resource already existed before the restore was executed. By default, velero doesn't attempt to modify resources to be restored if they already exist, so there will be a warning if the resource already exists but the content differs from what was in the backup. If you need the version from backup instead of what's already in the cluster, you have a couple options:

delete the resource before restoring
set the existing resource policy to update -- when this is set, velero will attempt to update resources that already exist rather than just warning and moving on. In some cases this will fail -- if the resource is immutable or one of the fields which differs is an immutable field, if this happens, velero falls back to warning the user that it couldn't make the change. Documentation for this feature is here: https://velero.io/docs/v1.11/restore-reference/#restore-existing-resource-policy

pavansokkenagaraj · 2023-05-17T16:00:21Z

@sseago

delete the resource before restoring

The endpoints and services are delete before restore.

When velero restores Service first, k8s api will create a new endpoints for the Service as there are no endpoints exist.
Later velero restores endpoints and at this time, there is an new endpoints from the above step and the content differs, so velero restore returns an warning.

set the existing resource policy to update -- when this is set, velero will attempt to update resources that already exist rather than just warning and moving on. In some cases this will fail -- if the resource is immutable or one of the fields which differs is an immutable field, if this happens, velero falls back to warning the user that it couldn't make the change. Documentation for this feature is here: https://velero.io/docs/v1.11/restore-reference/#restore-existing-resource-policy

This would be a regression as the behaviour changed from v1.10.2 to v1.10.3/v1.11.0
In velero v1.10.2, with resource policy <none>, velero was restoring the resources without any warnings.[restores Endpoints first then Services]

So, now in version > v1.10.3/v1.11.0 the velero restores Services first and Endpoints later and the warning will always occur as k8s API will reconcile Service and create a new Endpoints which later conflicts with the restoring Endpoint
Is this an expected behaviour from velero restore for versions starting from v1.10.3/v1.11.0?

Shouldn't Enpoints be restored before Service [adding Endpoint to HighPriority before Service] so that the restore behaves like versions < v1.10.2 ?

sgalsaleh · 2023-05-17T17:27:00Z

As @pavansokkenagaraj mentioned, I also think that endpoints need to be restored before services as Kubernetes would automatically create endpoints for services if the corresponding pods exist. Since pods are also in the high priority list and are restored before services, I believe endpoints will have to be restored before services as well.

sseago · 2023-05-17T19:28:33Z

Ahh, I see now. So it looks like services were added as a high priority resource but endpoints were not. So we need endpoints added to high priority before service. Also, it looks like this only worked before by luck -- because Endpoints happen to fall earlier in the alphabetical sort.

pavansokkenagaraj · 2023-05-17T20:13:24Z

Also, it looks like this only worked before by luck -- because Endpoints happen to fall earlier in the alphabetical sort.

Yes. 🤞🏽

reasonerjt · 2023-05-22T06:14:47Z

I believe the change was introduced purposefully.

@ywk253100 please clarify.

ywk253100 · 2023-05-22T06:34:40Z

The service was put into the high-priority list to fix an issue of AKO-operator, but seems the endpoint issue is a regression.

sseago · 2023-05-22T13:47:46Z

Right. Adding service to high priority is fine. But I think we need to add endpoint as well, right before service, because there are use cases where endpoints must be restored before services.

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

Restore Endpoints before Services Fixes #6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

reasonerjt assigned ywk253100 May 22, 2023

reasonerjt added the Needs investigation label May 22, 2023

ywk253100 added this to the v1.12 milestone May 22, 2023

ywk253100 added target/1.11.1 target/1.10.4 and removed Needs investigation labels May 22, 2023

ywk253100 added a commit to ywk253100/velero that referenced this issue May 29, 2023

Restore Endpoints before Services

ddc986c

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 mentioned this issue May 29, 2023

Restore Endpoints before Services #6315

Merged

3 tasks

ywk253100 added a commit to ywk253100/velero that referenced this issue May 29, 2023

Restore Endpoints before Services

61885c3

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 mentioned this issue May 29, 2023

[1.11]Restore Endpoints before Services #6316

Merged

3 tasks

ywk253100 added a commit to ywk253100/velero that referenced this issue May 29, 2023

Restore Endpoints before Services

a88989a

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 mentioned this issue May 29, 2023

[1.10]Restore Endpoints before Services #6317

Merged

3 tasks

ywk253100 added a commit to ywk253100/velero that referenced this issue Jun 19, 2023

Restore Endpoints before Services

292b3d5

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 added a commit to ywk253100/velero that referenced this issue Jun 19, 2023

Restore Endpoints before Services

13fb9ce

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 added a commit to ywk253100/velero that referenced this issue Jun 19, 2023

Restore Endpoints before Services

9ca7d30

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 added a commit to ywk253100/velero that referenced this issue Jun 19, 2023

Restore Endpoints before Services

93248ac

Restore Endpoints before Services Fixes vmware-tanzu#6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 added a commit that referenced this issue Jun 20, 2023

Restore Endpoints before Services (#6317)

788013b

Restore Endpoints before Services Fixes #6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 added a commit that referenced this issue Jun 20, 2023

Restore Endpoints before Services (#6316)

286db70

Restore Endpoints before Services Fixes #6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

ywk253100 closed this as completed in #6315 Jun 20, 2023

ywk253100 added a commit that referenced this issue Jun 20, 2023

Restore Endpoints before Services (#6315)

6f3adcf

Restore Endpoints before Services Fixes #6280 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero restore completes with warning while restoring services and endpoints backup #6280

Velero restore completes with warning while restoring services and endpoints backup #6280

pavansokkenagaraj commented May 17, 2023

sseago commented May 17, 2023

pavansokkenagaraj commented May 17, 2023

sgalsaleh commented May 17, 2023

sseago commented May 17, 2023

pavansokkenagaraj commented May 17, 2023

reasonerjt commented May 22, 2023

ywk253100 commented May 22, 2023

sseago commented May 22, 2023

Velero restore completes with warning while restoring services and endpoints backup #6280

Velero restore completes with warning while restoring services and endpoints backup #6280

Comments

pavansokkenagaraj commented May 17, 2023

sseago commented May 17, 2023

pavansokkenagaraj commented May 17, 2023

sgalsaleh commented May 17, 2023

sseago commented May 17, 2023

pavansokkenagaraj commented May 17, 2023

reasonerjt commented May 22, 2023

ywk253100 commented May 22, 2023

sseago commented May 22, 2023