-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate VR condition error message to protected pvc conditions #1639
Conversation
a84c11b
to
c496fe9
Compare
Works for VR conditions other then Validated. We propagate the messages form the VR conditions: VR:
VRG:
Missing change: when Validated condition is False, we want to set the DataReady condition and DataProtected using the error message from the Validated condition. Currently we use the Validated condition only for checking if the VR is finished and can be removed. |
a0d18cc
to
54e7e67
Compare
Propgartion to protected pvcs message works now for all VR conditions:
But we have 25 failed unit tests, need to understand why they fail. |
We don't propagate the protected pvcs conditions to the drpc, so on the hub this does not help to debug the issue. Maybe we can add list or errors messages from protected pvcs to make it easier to debug.
|
d958c57
to
f84aa7f
Compare
internal/controller/vrg_volrep.go
Outdated
v.log.Info(fmt.Sprintf("%s (VolRep: %s/%s)", errorMsg, volRep.GetName(), volRep.GetNamespace())) | ||
// Check the condition, but update the pvc conditions only if the check was successful. | ||
conditionMet, ok, errorMsg := isVRConditionMet(volRep, volrep.ConditionValidated, metav1.ConditionTrue) | ||
if !conditionMet && ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if isVRConditionMet
fails, and returns !met && !ok
, then this if
will evaluate to false because !(!met) && !ok
is false. That is !(false) && false
It seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ok is false, we don't know the condition value. The condition is missing, the observed generation does not match the object generation, or the value is unknown. For deleting the vr during deletion, we must wait for Status=False. When not deleting, we need to wait and try agin later.
What is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the error is ignored then. Meaning, we will not enter the if
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is; if this line fails, then we ignore the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an error, we don't know the value of Validated condition. This was the behavior before this change.
We can treat this as an error, since before validated becomes true nothing will work, and it is a quick check, so we can set the conditions and skip checking the next condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I can keep the log for the error message, so we report missing, stale or unknown status, without affecting the conditions.
But note that this was logged only when deleting the vrg because previously we called this only during delete. Now we always call this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's why I said either log it or remove those error log statements, line 1619, 1626, and 1633. But don't just throw it away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify if instead of returning ok boolean, we will return enum:
- ConditionMissing
- ConditionStale
- ConditionUnknown
- ConditionAvailable
With this I can update DataReady and DataProtected for all states except ConditionMissing, and I can return true during deletion only for ConditionAvailable.
Example cases:
state | action |
---|---|
ConditionMissing | check next condition |
ConditionStale | update pvc conditions and return false |
ConditionUnknown | update pvc conditions and return false |
ConditionAvailable | if condition not met return true if deleting, otherwise false. If true check next condition |
With this validateVRStatus will look like the other helpers, always logging and setting the pv conditions, unless the condition is missing (old csi-addons).
@BenamarMk what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s definitely a better long-term solution. However, it requires a lot of changes. You might want to focus on addressing just the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to return condition state is simple. Handling the condition is little bit more complicated but is more correct.
2f93dcf
to
fced314
Compare
@nirs I not very sure on how the final VRG will look like, can you please point out the comment from above? |
This comment show the change in the vrg: |
855be7f
to
ec846ff
Compare
LGTM, expect currently we are just setting error message to dataProtect and dataReady, but based on various conditions from VR, it should be populated and show messages accordingly. |
Setting the error message is the purpose of change. In normal condition we know the exact state so using the message from the VR is not very useful. We may simplify the code later to just use the message from the VR. Another issue duplicating the content of DataReady and DataProtected conditions, which does not seems right, but this is not a new issue, and changing it is not in the scope of this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than two small comments, the rest looks good to me.
conditionMissing = conditionState("missing") | ||
conditionStale = conditionState("stale") | ||
conditionUnknown = conditionState("unknown") | ||
conditionKnown = conditionState("known") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while reviewing, I got confused mixing conditionKnown
and conditionUnknown
. I think it is better to rename conditionKnown
to conditionExists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conditionExists is confusing with conditionMissing, but I can find a term that different from Unknown.
v.log.Info(fmt.Sprintf("%s (VolRep: %s/%s)", errorMsg, volRep.GetName(), volRep.GetNamespace())) | ||
) (bool, conditionState) { | ||
conditionMet, condState, errorMsg := isVRConditionMet(volRep, volrep.ConditionValidated, metav1.ConditionTrue) | ||
if !conditionMet && condState != conditionMissing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still log the errorMsg
when the condState
is conditionMissing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue, since we log the defaultMsg (like we do for other conditions), we will log:
"VolumeReplication resource not validated"
Which is confusing when using csi-addons < 0.10.0. For example when we Completed=True, which means that everything is good. So I think the best way is to keep the current behavior, log only if the condition exists.
@nirs can we get one more approval on this and merge it? |
It was added in failed state (Status=False, Reason=PrerequisiteNotMet) by default which is wrong. The issue was hidden since we check the condition only when deleting the VRG. We want to inspect the condition when the when the VRG is live, so we can report the condition status in the protected pvcs conditions. Signed-off-by: Nir Soffer <[email protected]>
Based on the messages addded in csi-addons/kubernetes-csi-addons#691. We want to propagate the error messages to the protected pvcs conditions. Signed-off-by: Nir Soffer <[email protected]>
When a VR condition is not met, we set the protected PVC condition message using the error message returned from isVRConditionMet(). When using csi-addons > 0.10.0, we use now the message from the condition instead of the default message. Since the Validated condition is not reported by older version of csi-addons, and we must wait until the Validated condition status is known when VRG is deleted, isVRConditionMet() returns now also the state of the condition, which can be: - missing: condition not found - stale: observed generation does not match object generation - unknown: the special "Unknown" value - known: status is True or False When we validate the Validate condition we have these cases: - Condition is missing: continue to next condition. - Condition is met: continue to the next condition. - Condition not met and its status is False. This VR will never complete and it is safe to delete since replication will never start. If VRG is deleted, we return true since the VR reached the designed state. Otherwise we return false. In this case we updated the protected pvc condition with the message from the VR condition. - Condition is not met and is stale or unnown: we need to check again later. There is no point to check the completed condition since a VR cannot complete without validation.In this case we updated the protected pvc condition with the message generated by isVRConditionMet() for stale or unknown conditions. Example protected pvc DataReady condition with propagated message when VR validation failed: conditions: - lastTransitionTime: "2024-11-06T15:33:06Z" message: 'failed to meet prerequisite: rpc error: code = FailedPrecondition desc = system is not in a state required for the operation''s execution: failed to enable mirroring on image "replicapool/csi-vol-fe2ca7f8-713c-4c51-bf52-0d4b2c11d329": parent image "replicapool/csi-snap-e2114105-b451-469b-ad97-eb3cbe2af54e" is not enabled for mirroring' observedGeneration: 1 reason: Error status: "False" type: DataReady Signed-off-by: Nir Soffer <[email protected]>
Verify that when VR Validated condition is False, the condition message is propagated to the protected pvc DataReady condition message. To make this easy to test, we have a new helper for waiting until protected pvc condition status and message are updated to specified values. We propagate the same message to the DataProtected condition, but this behavior seems like unwanted behavior that should change and is not worth testing. Signed-off-by: Nir Soffer <[email protected]>
We don't need more approvals. We kept it to give time for more reviewers. |
When a VR condition is not met, we set the protected PVC condition
message using the error message returned from isVRConditionMet(). When
using csi-addons > 0.10.0, we use now the message from the condition
instead of the default message.
Since the Validated condition is not reported by older version of
csi-addons, and we must wait until the Validated condition status is
known when VRG is deleted, isVRConditionMet() returns now also the state
of the condition, which can be:
When we validate the Validate condition we have these cases:
Condition is missing: continue to next condition.
Condition is met: continue to the next condition.
Condition not met and its status is False. This VR will never
complete and it is safe to delete since replication will never start.
If VRG is deleted, we return true since the VR reached the designed
state. Otherwise we return false. In this case we updated the
protected pvc condition with the message from the VR condition.
Condition is not met and is stale or unnown: we need to check again
later. There is no point to check the completed condition since a VR
cannot complete without validation.In this case we updated the
protected pvc condition with the message generated by
isVRConditionMet() for stale or unknown conditions.
Example protected pvc DataReady condition with propagated message when
VR validation failed:
Note
Using development build of csi-addons adding for testing.
We don't depend on the csi-addon release to merge this fix, but it will
be affective only when using csi-addons including this change:
csi-addons/kubernetes-csi-addons#691