Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile related drpcs when drcluster is deleted #1168

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Jan 8, 2024

  • Watch also DRCluster delete events
  • When filtering drpcs consider all drpcs referencing a deleted drcluster.

With this change, when a drcluster is deleted, the drpc controller should update the VRG ManifestWork on the remaining cluster. Previously this happened minutes after a drcluster was deleted.

Testing:

  • image: quay.io/nirsof/ramen-operator:update-vrg-v1

Status:

  • Test disable dr flow with single drpc with drenv and busybox rbd deployment on top of Test demo #1153
  • Test disable dr flow with multiple apps (deploy, sts, ds)
  • Run basic test with 3 subscription rbd apps (deploy, sts, ds) to check for regression
  • Test on OCP/volsync
    • This small internal change should work on any replication/storage/platform. I'll do quick test on OCP later.

- When filtering update events, consider also the update when the new
  object is marked for deletion.
- When filtering drpcs consider all drpcs referencing a deleted
  drcluster.

With this change, when a drcluster is deleted, the drpc controller
updates the VRG ManifestWork on the remaining cluster. Previously this
happened minutes after a drcluster was deleted.

Notes:

- We don't get a delete event when the drcluster is deleted, but an
  update event. I don't know if this is a bug in controller runtime or
  expected behavior.

Signed-off-by: Nir Soffer <[email protected]>
@nirs nirs marked this pull request as ready for review January 8, 2024 19:13
@nirs
Copy link
Member Author

nirs commented Jan 8, 2024

Testing with drenv

  1. Deploy busybox rbd deployment (app running on dr1)
  2. Enable DR
  3. Suspend vm dr1
    virsh -c qemu:///system suspend dr1
    
  4. Failover to dr2
    • waiting for peer ready times out - expected
  5. Dump vrg on dr2 before deleting the cluster
    kubectl get vrg busybox-regional-rbd-deploy-drpc -n busybox-regional-rbd-deploy \
        -o yaml --context dr2 > vrg-before-delete-drcluster.yaml 
    
  6. Delete drcluter dr1
    kubectl delete drcluster dr1 --wait=false --context hub
    
  7. Dump vrg on dr2 after deleting the cluster
    kubectl get vrg busybox-regional-rbd-deploy-drpc -n busybox-regional-rbd-deploy \
        -o yaml --context dr2 > vrg-after-delete-drcluster.yaml 
    
  8. Dump ramen hub logs
    kubectl logs deploy/ramen-hub-operator -n ramen-system --context hub > ramen-hub.log
    

VRG diff show that minio-on-dr1 was removed

...
       appname: busybox
   replicationState: primary
   s3Profiles:
-  - minio-on-dr1
   - minio-on-dr2
   volSync: {}
 status:
   conditions:
...

Interesting events from ramen log

  1. Gettting update event when drclsuter was deleted
2024-01-08T18:46:06.190Z        INFO    DRPCPredicate.DRCluster controllers/drplacementcontrol_controller.go:275        Update event
2024-01-08T18:46:06.190Z        INFO    controllers/drplacementcontrol_controller.go:623        DRPC Map: Filtering DRCluster (dr1)
  1. New logs when filtering drpcs using this cluster
2024-01-08T18:46:06.190Z        INFO    DRPCFilter.DRCluster    controllers/drplacementcontrol_controller.go:405        Found DRPolicy referencing DRCluster    {"cluster": {"metadata":{"name":"dr1","uid":"215969da-fde4-4cbf-8c56-775523a90090","resourceVersion":"5709","generation":2,"creationTimestamp":"2024-01-08T18:38:30Z","deletionTimestamp":"2024-01-08T18:46:06Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-08T18:46:06.190Z        INFO    DRPCFilter.DRCluster    controllers/drplacementcontrol_controller.go:445        Found DRPC referencing drpolicy {"cluster": {"metadata":{"name":"dr1","uid":"215969da-fde4-4cbf-8c56-775523a90090","resourceVersion":"5709","generation":2,"creationTimestamp":"2024-01-08T18:38:30Z","deletionTimestamp":"2024-01-08T18:46:06Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-deploy-drpc", "namespace": "busybox-regional-rbd-deploy", "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-08T18:46:06.191Z        INFO    controllers.DRPlacementControl  controllers/drplacementcontrol_controller.go:677        Entering reconcile loop {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01"}
  1. Updating VRG manifest work without minio-on-dr1 s3 profile
2024-01-08T18:46:06.206Z        INFO    controllers.DRPlacementControl  util/mw_util.go:121     Create or Update manifestwork busybox-regional-rbd-deploy-drpc:busybox-regional-rbd-deploy:dr2:{TypeMe
ta:{Kind:VolumeReplicationGroup APIVersion:ramendr.openshift.io/v1alpha1} ObjectMeta:{Name:busybox-regional-rbd-deploy-drpc GenerateName: Namespace:busybox-regional-rbd-deploy SelfLink: UID: Resourc
eVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[drplacementcontrol.ramendr.openshift.io/d
estination-cluster:dr2] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{PVCSelector:{MatchLabels:map[appname:busybox] MatchExpressions:[]} ReplicationState:primary S3Profiles:[minio-on-dr2]
 Async:0xc000cb3b80 Sync:<nil> VolSync:{RDSpec:[] Disabled:false} PrepareForFinalSync:false RunFinalSync:false Action:Failover KubeObjectProtection:<nil>} Status:{State: ProtectedPVCs:[] Conditions:
[] ObservedGeneration:0 LastUpdateTime:0001-01-01 00:00:00 +0000 UTC KubeObjectProtection:{CaptureToRecoverFrom:<nil>} PrepareForFinalSyncComplete:false FinalSyncComplete:false LastGroupSyncTime:<ni
l> LastGroupSyncDuration:nil LastGroupSyncBytes:<nil>}} {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01"}
2024-01-08T18:46:06.206Z        INFO    controllers.DRPolicy    util/secrets_util.go:541        Add Secret      {"DRPolicy": "busybox-regional-rbd-deploy", "rid": "4f9c75f5-588e-426e-93f1-54a8420ae3
81", "cluster": "dr2", "secret": "ramen-s3-secret-dr1"}
2024-01-08T18:46:06.206Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01", "name": "busybox-regional-rbd-deploy-drpc-busybox-regional-rbd-deploy-vrg-mw", "namespace": "dr2"}
2024-01-08T18:46:06.209Z        INFO    controllers.DRPolicy    util/secrets_util.go:541        Add Secret      {"DRPolicy": "busybox-regional-rbd-deploy", "rid": "4f9c75f5-588e-426e-93f1-54a8420ae381", "cluster": "dr2", "secret": "ramen-s3-secret-dr2"}

Timeline

  • 2024-01-08T18:46:06.190Z - detecting the deletion
  • 2024-01-08T18:46:06.206Z - updating manifest work

Logs and resources

update-vrg.tar.gz

@nirs
Copy link
Member Author

nirs commented Jan 9, 2024

Testing multiple apps

Deploy and enable dr for 3 apps

Deploy 3 apps and relocate if need so all run on clsuter dr1.

basic-test/deploy -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
basic-test/deploy -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
asic-test/relocate -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
basic-test/deploy -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml

State before simulating disaster:

$ kubectl get drpc -A --context hub
NAMESPACE                     NAME                               AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   busybox-regional-rbd-deploy-drpc   7m46s   dr1                                                 Deployed
busybox-regional-rbd-ds       busybox-regional-rbd-ds-drpc       7m28s   dr1                                                 Deployed
busybox-regional-rbd-sts      busybox-regional-rbd-sts-drpc      7m36s   dr1                                  Relocate       Relocated

Simulate disaster

$ virsh -c qemu:///system list
 Id   Name   State
----------------------
 10   dr1    running
 11   dr2    running
 12   hub    running

$ virsh -c qemu:///system suspend dr1
Domain 'dr1' suspended

Failover all apps

basic-test/failover -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
asic-test/failover -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
basic-test/failover -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml

The failover will timeout waiting for PeerReady condition - expected
since dr1 is not available.

State after all apps failed over (Available condition met):

$ kubectl get drpc -A --context hub
NAMESPACE                     NAME                               AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   busybox-regional-rbd-deploy-drpc   25m   dr1                dr2               Failover       FailedOver
busybox-regional-rbd-ds       busybox-regional-rbd-ds-drpc       25m   dr1                dr2               Failover       FailedOver
busybox-regional-rbd-sts      busybox-regional-rbd-sts-drpc      25m   dr1                dr2               Failover       FailedOver

$ kubectl get vrg,vr -A --context dr2
NAMESPACE                     NAME                                                                           DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-deploy-drpc   primary        Primary
busybox-regional-rbd-ds       volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-ds-drpc       primary        Primary
busybox-regional-rbd-sts      volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-sts-drpc      primary        Primary

NAMESPACE                     NAME                                                                  AGE     VOLUMEREPLICATIONCLASS   PVCNAME            DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   volumereplication.replication.storage.openshift.io/busybox-pvc        2m42s   vrc-sample               busybox-pvc        primary        Primary
busybox-regional-rbd-ds       volumereplication.replication.storage.openshift.io/busybox-pvc        2m21s   vrc-sample               busybox-pvc        primary        Primary
busybox-regional-rbd-sts      volumereplication.replication.storage.openshift.io/varlog-busybox-0   2m33s   vrc-sample               varlog-busybox-0   primary        Primary
busybox-regional-rbd-sts      volumereplication.replication.storage.openshift.io/varlog-busybox-1   2m18s   vrc-sample               varlog-busybox-1   primary        Primary

Dump VRG before deleting the drcluster

kubectl get vrg -A -o yaml > vrgs-before-delete-cluster.yaml

Delete drcluster

kubectl delete drcluster dr1 --wait=false --context hub

Dump VRG after deleting the drcluster

kubectl get vrg -A -o yaml > vrgs-before-after-cluster.yaml

VRGs diff:

$ diff -u vrgs-before-delete-cluster.yaml vrgs-after-delete-cluster.yaml | grep -- '- minio-on-dr1'
-    - minio-on-dr1
-    - minio-on-dr1
-    - minio-on-dr1

Ramen hub logs:

Filtering DRPCs during update event with deleted drcluster:

$ kubectl logs deploy/ramen-hub-operator -n ramen-system --context hub | grep 'Found DRPC referencing drpolicy'
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-deploy-drpc", "namespace": "busybox-regional-rbd-deploy", "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-sts-drpc", "namespace": "busybox-regional-rbd-sts", "drpolicy": "busybox-regional-rbd-sts"}
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-ds-drpc", "namespace": "busybox-regional-rbd-ds", "drpolicy": "busybox-regional-rbd-ds"}

Updating the VRG manifest work

2024-01-09T13:09:51.003Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-ds/busybox-regional-rbd-ds-drpc", "rid": "1f75b7d4-4471-4fd9-b5af-8849c8268752", "name": "busybox-regional-rbd-ds-drpc-busybox-regional-rbd-ds-vrg-mw", "namespace": "dr2"}
...
2024-01-09T13:09:51.004Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-sts/busybox-regional-rbd-sts-drpc", "rid": "f762233f-78ce-42cd-a9b3-b6f2ba874a93", "name": "busybox-regional-rbd-sts-drpc-busybox-regional-rbd-sts-vrg-mw", "namespace": "dr2"}
...
2024-01-09T13:09:51.004Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "18b281c3-1d3d-4ac4-97d1-926ae3915037", "name": "busybox-regional-rbd-deploy-drpc-busybox-regional-rbd-deploy-vrg-mw", "namespace": "dr2"}

Disabling DR for all apps

$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
2024-01-09 15:23:07,035 INFO    [disable-dr] Disable DR
2024-01-09 15:23:07,082 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-deploy-drpc'
2024-01-09 15:23:26,325 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:26,388 INFO    [disable-dr] DR was disabled
$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
2024-01-09 15:23:18,388 INFO    [disable-dr] Disable DR
2024-01-09 15:23:18,436 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-sts-drpc'
2024-01-09 15:23:36,512 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:36,573 INFO    [disable-dr] DR was disabled
$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml
2024-01-09 15:23:39,791 INFO    [disable-dr] Disable DR
2024-01-09 15:23:39,839 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-ds-drpc'
2024-01-09 15:23:44,357 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:44,419 INFO    [disable-dr] DR was disabled

State after disabling dr:

$ kubectl get deploy,pod,pvc -n busybox-regional-rbd-deploy --context dr2
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/busybox   1/1     1            1           24m

NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-7c4d67bf49-sc89x   1/1     Running   0          24m

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-da2d3812-ce22-4421-a58e-75d9fb79b4ca   1Gi        RWO            rook-ceph-block   24m

$ kubectl get sts,pod,pvc -n busybox-regional-rbd-sts --context dr2
NAME                       READY   AGE
statefulset.apps/busybox   2/2     24m

NAME            READY   STATUS    RESTARTS   AGE
pod/busybox-0   1/1     Running   0          24m
pod/busybox-1   1/1     Running   0          23m

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/varlog-busybox-0   Bound    pvc-eb2764c0-8228-45bb-a9ad-18cb227ee276   1Gi        RWO            rook-ceph-block   25m
persistentvolumeclaim/varlog-busybox-1   Bound    pvc-60de6fee-ae33-4f27-ba02-6538acbdcba8   1Gi        RWO            rook-ceph-block   25m

$ kubectl get ds,pod,pvc -n busybox-regional-rbd-ds --context dr2
NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/busybox   1         1         1       1            1           <none>          24m

NAME                READY   STATUS    RESTARTS   AGE
pod/busybox-w8d8g   1/1     Running   0          24m

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-9937cd1b-dd65-4c6d-a78d-849f0318bae8   1Gi        RWO            rook-ceph-block   25m

Logs and resources

update-vrg-multi-apps.tar.gz

@netzzer
Copy link
Member

netzzer commented Jan 12, 2024

Tested Nir's ramen image (quay.io/nirsof/ramen-operator:update-vrg-v1) today with fix described in this PR using downstream ODF 4.15 build 112, OCP 4.14.6, ACM 2.9.1. Log collected for ramen hub pod ~10-15 mins after drcluster perf2 deleted.

1) failed all nodes for perf2
2) Failover 4 apps to perf3 (busybox with cephrbd appset and subscription; busybox with cephfs appset and subscription)
3) deleted drcluster perf2
4) checked apps and perf2 s3Profile deleted in seconds from VRGs

Check on perf3
$ oc get vrg  -o yaml -A | grep -A 2 s3P
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}

Logs: ramen-hub-operator-log.txt.gz

Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one observation on the delete event that may need to be resolved.

// Exhausted all failover activation checks, this update is NOT of interest
return false
// Exhausted all failover activation checks, the only interesting update is deleting a drcluster.
return drClusterIsDeleted(newDRCluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if a delete of a resource is causing an Update event to trigger as well. I would have just returned true from line 269 above, which is where the predicate function called when the watched resource is deleted.

Copy link
Member

@BenamarMk BenamarMk Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Shyam. You are better off returning true on line 269 and allowing all drpcs to reconcile. Deleting a drcluster should NOT be a common use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started this change by returning true in line 269, but unfortunately this does not work.

Deleting creates an update event, when the new object has a deletionTimeStamp. When the object is actually removed from the system, we get a delete event.

In our case this event is not relevant. It will happen when the drpolicy is deleted the the ramen removes the finalizers from the the drcluster.

Copy link
Member

@BenamarMk BenamarMk Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so adding the deletionTimeStamp ends up firing for Updated. Got it.
So how about if in line 307, you return true every time the objectNew (newDRCluster) has a non-zero deletionTimeStamp?
I think avoiding adding the new code (DRPCsUsingDRCluster and DRPCsUsingDRPolicy) is desirable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not try the change without modifying FilterDRCluster - but if we don't modify it we will trigger a reconcile only in the DRPCs which are failing over to the deleted cluster, which is always no DRPC.

We an simplify by returning all DPPCS but if we have to add code it makes sense to add the right code which is only few more lines.

@ShyamsundarR ShyamsundarR merged commit 2b66f98 into RamenDR:main Jan 12, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants