Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disable dr when VR failed validation #1570

Merged
merged 7 commits into from
Oct 1, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Sep 23, 2024

When deleting a primary VRG, we wait until the VR Completed condition is
met. However if a VR precondition failed, for example using a drpolicy
without flattening enabled when the PVC needs flattening, the VR will
never complete and the vrg and drpc deletion will never complete.

Since csi-addons 0.10.0 we have a new Validated VR condition, set to
true if pre conditions are met, and false if not. VR is can be deleted
safely in this state, since mirroring was not enabled.

This changes modifies deleted VRG processing to check the new VR
Validated status. If the condition exist and the condition status is
false, validateVRStatus() return true, signaling that the VR is in the
desired state, and ramen completes the delete flow.

If the VR does not report the Validated condition (e.g. old csi-addon
version) or the condition status is true (mirroring in progress), we
continue in the normal flow. The VR will be deleted only when the
Completed condition status is true.

Tested locally with discovered app using a pvc created from a volume
snapshot.

Status:

@nirs nirs force-pushed the vr-validated branch 12 times, most recently from e092710 to 2a6d572 Compare September 23, 2024 22:38
@nirs nirs changed the title Handle VR that failed validation during VRG deletion Fix disable dr when VR failed validation Sep 23, 2024
@nirs nirs marked this pull request as ready for review September 24, 2024 14:04
@nirs nirs force-pushed the vr-validated branch 2 times, most recently from 39d7efa to 26c7680 Compare September 24, 2024 17:21
@nirs
Copy link
Member Author

nirs commented Sep 24, 2024

New logs during disabling dr:

2024-09-24T19:42:04.755Z    INFO    controllers.VolumeReplicationGroup.vrginstance  controller/vrg_volrep.go:1428   VolumeReplication restored-pvc/flatten failed validation and can be deleted {"VolumeReplicationGroup": {"name":"flatten-drpc","namespace":"ramen-ops"}, "rid": "12b2da9d-8c78-40a8-893f-574ed2ea26ca", "State": "primary", "Finalize": true}

2024-09-24T19:42:04.755Z    INFO    controllers.VolumeReplicationGroup.vrginstance  controller/vrg_volrep.go:1839   Deleting VolumeReplication resource %s/%s   {"VolumeReplicationGroup": {"name":"flatten-drpc","namespace":"ramen-ops"}, "rid": "12b2da9d-8c78-40a8-893f-574ed2ea26ca", "State": "primary", "Finalize": true, "flatten": "restored-pvc"}

2024-09-24T19:42:04.884Z    INFO    controllers.VolumeReplicationGroup.vrginstance  controller/vrg_volrep.go:449    Deleting ramen annotations from PersistentVolume    {"VolumeReplicationGroup": {"name":"flatten-drpc","namespace":"ramen-ops"}, "rid": "615ab075-dfbf-4ad4-bba8-b016104463f2", "State": "primary", "Finalize": true, "pvc": "flatten/restored-pvc", "pv": "pvc-94c88954-8764-4e88-a438-158efa0a6a3d"}                                                                                                                                

2024-09-24T19:42:04.888Z    INFO    controllers.VolumeReplicationGroup.vrginstance  controller/vrg_volrep.go:467    Deleting ramen annotations and finallizers from PersistentVolumeClaim   {"VolumeReplicationGroup": {"name":"flatten-drpc","namespace":"ramen-ops"}, "rid": "615ab075-dfbf-4ad4-bba8-b016104463f2", "State": "primary", "Finalize": true, "pvc": "flatten/restored-pvc", "owner removed": true, "finalizer removed": true}

Complete log:
disable-dr.log

internal/controller/vrg_volrep.go Outdated Show resolved Hide resolved
internal/controller/vrg_volrep.go Outdated Show resolved Hide resolved
internal/controller/vrg_volrep.go Outdated Show resolved Hide resolved
) bool {
// Check validated for primary during VRG deletion.
if state == ramendrv1alpha1.Primary && rmnutil.ResourceIsDeleted(v.instance) {
validated, ok := v.validateVRValidatedStatus(volRep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a straight call to check the condition, do we need a wrapper function for this?

Suggested change
validated, ok := v.validateVRValidatedStatus(volRep)
conditionMet, errorMsg := isVRConditionMet(volRep, volrep.ConditionValidated, metav1.ConditionTrue)
if !conditionMet && errorMsg == "" {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can update the DataReady and Protected conditions since we have more specific error message. I remvoed this update since many tests failed, I think the issue is wrong validation in the test and it is not related to this update, but I focused on solving the main problem. We can improve this in the next step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, v.validateVRValidatedStatus is a good place to log errors about missing, stale, or unknown condition, similar to validateCompletedStatus and the other helper for secondary vrg.

Was forgetten when we replaced minikube volumesnapshot addons.

Signed-off-by: Nir Soffer <[email protected]>
The comment is still unclear, but at least easier to read.

Signed-off-by: Nir Soffer <[email protected]>
- Replace "VR under deletion" with "deleted VR", matching the
  terminology used in the code.
- Replace "detected as missing or deleted" with "is missing" when we
  know the VR does not exist.

Signed-off-by: Nir Soffer <[email protected]>
Moved from validateVRStatus() to make room from checking VR VAlidated
status. This also make the function easier to understand, keeping the
same level of abstraction and getting rid of the uninteresting details.

Signed-off-by: Nir Soffer <[email protected]>
Rename msg to errorMsg and document that this is an error message,
if the we could not get the condition value, because it is missing,
stale, or unknown.

Signed-off-by: Nir Soffer <[email protected]>
Log after important changes to the system in delete VR flow to make it
easier to understand what the system is doing, and how ramen changed the
system.

New logs:
- delete the VR resource
- remove annotations from PV

Improve logs:
- remove annotations, labels, and finalizers from PVC

Signed-off-by: Nir Soffer <[email protected]>
When deleting a primary VRG, we wait until the VR Completed condition is
met. However if a VR precondition failed, for example using a drpolicy
without flattening enabled when the PVC needs flattening, the VR will
never complete and the vrg and drpc deletion will never complete.

Since csi-addons 0.10.0 we have a new Validated VR condition, set to
true if pre conditions are met, and false if not. VR is can be deleted
safely in this state, since mirroring was not enabled.

This changes modifies deleted VRG processing to check the new VR
Validated status. If the condition exist and the condition status is
false, validateVRStatus() return true, signaling that the VR is in the
desired state, and ramen completes the delete flow.

If the VR does not report the Validated condition (e.g. old csi-addon
version) or the condition status is true (mirroring in progress), we
continue in the normal flow. The VR will be deleted only when the
Completed condition status is true.

Tested with discovered deployment and vm using a pvc created from a
volume snapshot.

Signed-off-by: Nir Soffer <[email protected]>
@ShyamsundarR ShyamsundarR merged commit 5dfc74a into RamenDR:main Oct 1, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants